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This article aim at reviewing recent empirical and theoretical developments usually grouped under the term 
Econophysics. Since its name was coined in 1995 by merging the words "Economics" and "Physics" , this new 
interdisciplinary field has grown in various directions: theoretical macroeconomics (wealth distributions), 
microstructure of financial markets (order book modelling), econometrics of financial bubbles and crashes, 
etc. In the first part of the review, we begin with discussions on the interactions between Physics, Math- 
ematics, Economics and Finance that led to the emergence of Econophysics. Then we present empirical 
studies revealing statistical properties of financial time series. We begin the presentation with the widely 
acknowledged "stylized facts" which describe the returns of financial assets ~ fat tails, volatility clustering, 
autocorrelation, etc. - and recall that some of these properties are directly linked to the way "time" is taken 
into account. We continue with the statistical properties observed on order books in financial markets. For 
the sake of illustrating this review, (nearly) all the stated facts are reproduced using our own high-frequency 
financial database. Finally, contributions to the study of correlations of assets such as random matrix theory 
and graph theory are presented. In the second part of the review, we deal with models in Econophysics 
through the point of view of agent-based modelling. Amongst a large number of multi-agent-based models, 
we have identified three representative areas. First, using previous work originally presented in the fields of 
behavioural finance and market microstructure theory, econophysicists have developed agent-based models of 
order-driven markets that are extensively presented here. Second, kinetic theory models designed to explain 
some empirical facts on wealth distribution are reviewed. Third, we briefly summarize game theory models 
by reviewing the now classic minority game and related problems. 
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Part I 

I. INTRODUCTION 

What is Econophysics? Fifteen years after the word 
"Econophysics" was coined by H. E. Stanley by a merg- 
ing of the words 'Economics' and 'Physics', at an interna- 
tional conference on Statistical Physics held in Kolkata 
in 1995, this is still a commonly asked question. Many 
still wonder how theories aimed at explaining the phys- 
ical world in terms of particles could be applied to un- 
derstand complex structures, such as those found in the 
social and economic behaviour of human beings. In fact. 
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physics as a natural science is supposed to be precise 
or specific; its predictive powers based on the use of a 
few but universal properties of matter which are suffi- 
cient to explain many physical phenomena. But in social 
sciences, are there analogous precise universal properties 
known for human beings, who, on the contrary of funda- 
mental particles, are certainly not identical to each other 
in any respect ? And what little amount of informa- 
tion would be sufficient to infer some of their complex 
behaviours ? There exists a positive strive in answer- 
ing these questions. In the 1940's, Majorana had taken 
scientific interest in financial and economic systems. He 
wrote a pioneering paper on the essential analogy be- 
tween statistical law s in p hys ics and in social scie nces 
(|di Ettore Maioranal (|1942[) : iMantegnal (|2005l [2006^ ). 
However , during the foll ow ing decades, only few physi - 
cists like lKadanog (|l97l[ ) or lMontroll and Badgeil (|l974l ) 



had an explicit interest for research in social or eco- 
nomic systems. It was not until the 1990's that physicists 
started turning to this interdisciplinary subject, and in 
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the past years, they have made many successful attempts 
to ap proach problems m vario us fields of social s cience s 



(e.e. Ide Oliveira et al\ (Il999l) : IStauffer et aZI ^oM); 



IChakrabarti et al. ( 20061 )1. In particular, in Quantita- 
tive Economics and Finance, physics research has begun 
to be complementary to the most traditional approaches 
such as mathematical (stochastic) finance. These various 
investigations, based on methods imported from or also 
used in physics, are the subject of the present paper. 



A. Bridging Physics and Economics 

Economics deals with how societies efficiently use 
their resources to produce valuable commodities and dis- 
tribute them amon g different peopl e or economic agents 
(jSamuelsonl (|l998l) : iKevnesI (|l973l )). It is a disciphnc 
related to almost everything around us, starting from 
the marketplace through the environment to the fate of 
nations. At first sight this may seem a very different 
situation from that of physics, whose birth as a well de- 
fined scientific theory is usually associated with the study 
of particular mechanical objects moving with negligible 
friction, such as falling bodies and planets. However, a 
deeper comparison shows many more analogies than dif- 
ferences. On a general level, both economics and physics 
deal with "everything around us", despite with differ- 
ent perspectives. On a practical level, the goals of both 
disciplines can be either purely theoretical in nature or 
strongly oriented toward the improvement of the quality 
of life. On a more technical side, analogies often become 
equivalences. Let us give here some examples. 

Statistical mechanics has been defined as the 



"branch of physics that combines the prin- 
ciples and procedures of statistics with the 
laws of both classical and quantum mechan- 
ics, particularly with respect to the field of 
thermodynamics. It aims to predict and ex- 
plain the measurable properties of macro- 
scopic systems on the basis of the properties 
and behaviour of the microscopic constituents 
of those systems. '13 

Th e tools o f statistical mec h anics or statisti cal physics 
(|Reil (|1985[ ): iPathrial (|l996f ): iLandaul (|l965[) ). that in- 
clude extracting the average properties of a macroscopic 
system from the microscopic dynamics of the systems, are 
believed to prove useful for an economic system. Indeed, 
even though it is difficult or almost impossible to write 
down the "microscopic equations of motion" for an eco- 
nomic system with all the interacting entities, economic 
systems may be investigated at various size scales. There- 
fore, the understanding of the global behaviour of eco- 
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nomic systems seems to need concepts such as stochas- 
tic dynamics, correlation effects, self-organization, self- 
similarity and scaling, and for their application we do 
not have to go into the detailed "microscopic" descrip- 
tion of the economic system. 

Chaos theory has had so me impact in Economics m od- 
elling, e.g. in the wor k by iBrock and Hommesl ( 1998t ) or 
IChiarella et all ( 20061 ). The theory of disordered systems 
has also played a core role in Econophysics and study of 
"complex systems". The term "complex systems" was 
coined to cover the great variety of such systems which 
include examples from physics, chemistry, biology and 
also social sciences. The concepts and methods of sta- 
tistical physics turned out to be extremely useful in ap- 
plication to these diverse complex systems including eco- 
nomic systems. Many complex systems in natural and 
social environments share the characteristics of compe- 
tition among interacting agents for resources and their 
adapt at ion to dynami cally changing environment ( Parisil 
(Il999l ): lArthud(|l999( )). Hence, the concept of disordered 
systems helps for instance to go beyond the concept of 
representative agent, an approach prevailing in much of 
(ma cro)economics an d criticized by many economis ts (see 
e.g. iKirmanI (|l992[) : iGallegati and KirmanI (|l999[) ). Mi- 
nority games and their physical formulations have been 
exemplary. 

Physics models have also helped bringing new theo- 
ries explaining older observations in Economics. The 
Italian social economist Pareto investigated a century 
ago the wealth of individuals in a stable economy 
(iParetol (|l897a[) ) by modelling them with the distribu- 
tion P{> x) x~°', where P{> x) is the number of peo- 
ple having income greater than or equal to x and a is 
an exponent (known now as the Pareto exponent) which 
he estimated to be 1.5. To explain such empirical find- 
ings, physicists have come up with some very elegant 
and intriguing kinetic exchange models in recent times, 
and we will review these developments in the compan- 
ion article. Though the economic activities of the agents 
are driven by various considerations like "utility maxi- 
mization" , the eventual exchanges of money in any trade 
can be simply viewed as money /wealth conserving two- 
body scatterings, as in the entropy maximization based 
kinetic theory of gases. This qualitative analogy seems 
to be quite old and both economists and na tural scien- 
tists h ave already noted it in various contexts ( Saha et~al\ 
( 1950l )). Recently, an equivalence between the two maxi- 
mization principles have been quanti tatively established 
( Chakrabarti and Chakrabartil (l2010l) ). 

Let us discuss another example of the similarities of in- 
terests and tools in Physics and Economics. The friction- 
less systems which mark the early history of physics were 
soon recognized to be rare cases: not only at microscopic 
scale " where they obviously represent an exception due 
to the unavoidable interactions with the environment - 
but also at the macroscopic scale, where fluctuations of 
internal or external origin make a prediction of their 
time evolution impossible. Thus equilibrium and non- 
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equilibrium statistical mechanics, the theory of stochas- 
tic processes, and the theory of chaos, became main tools 
for studying real systems as well as an important part of 
the theoretical framework of modern physics. Very inter- 
estingly, the same mathematical tools have presided at 
the growth of classic modelling in Economics and more 
particularly in modern Finance. Following the works of 
Mandelbrot, Fama of the 1960s, physicists from 1990 on- 
wards have studied the fluctuation of prices and univer- 
salities in context of scaling theories, etc. These links 
open the way for the use of a physics approach in Fi- 
nance, complementary to the widespread mathematical 
one. 



B. Econophysics and Finance 

Mathematical finance has benefited a lot in the past 
thirty years from modern probability theory - Brownian 
motion, martingale theory, etc. Financial mathemati- 
cians are often proud to recall the most well-known source 
of the interactions between Mathematics and Finance: 
five years before Einstein's seminal work, the theory of 
the Brownian motion was first formulated by t he French 



mathe m atician Bacheh e r in his doctoral thesis ([Bachelieil 
(I1900D : i ones 2 (|1967D : iHaberman and SibbettI (|l995fl 1. 



in which he used this model to describe price fluctuations 
at the Paris Bourse. Bachelier had even given a course 
as a "free professor" at the Sorbonne University with the 
title: "Probability calculus with applications to finan- 
cial operations and analogies with certai n questions from 
physic s" (see the his to rical a. r ticles in ICourtault et all 
(I2000D : iTaoQuI (1200 i h: iForfail (|2002h '). 

Then Ito, following the works of Bachelier, Wiener, 
and Kolmogorov am ong many, formulated th e presently 
known Ito calculus ( Ito and McKeanI ( 1996() ). The ge- 
ometric Brownian motion, belonging to the class of 
Ito processes, later beca me an u T iporta nt ingredient 
of mo dels in Economics ( Osborn j ( 19591) : ISamuelsonI 
(11965)), and in the we ll -know n t heory o f optio n pric- 
ing (iBlack and ScholesI ^97^; iMertonI ^197^ ). In 
fact, stochastic calculus of diffusion processes combined 
with classical hypotheses in Economics le d to th e deve l- 
opment of the arbi t rage pricing theory (jPuffid (|l996[) . 
iFollmer and Schie 1 (|2004D '). The deregulation of finan- 
cial markets at the end of the 1980's led to the expo- 
nential growth of the financial industry. Mathematical 
finance followed the trend: stochastic finance with diffu- 
sion processes and exponential growth of financial deriva- 
tives have had intertwined developments. Finally, this 
relationship was carved in stone when the Nobel prize 
was given to M.S. Scholes and R.C. Merton in 1997 (F. 
Black died in 1995) for their contribution to the theory of 
option pricing and their celebrated "Black-Scholes" for- 
mula. 

However, this whole theory is closely linked to clas- 
sical economics hypotheses and has not been grounded 
enough with empirical studies of financial time scries. 



The Black-Scholcs hypothesis of Gaussian log-returns of 
prices is in strong disagr eement with empirical evidence. 
iMandelbrotI (^1960lll963^ was one of the firsts to observe 
a clear departure from Gaussian behaviour for these fluc- 
tuations. It is true that within the framework of stochas- 
tic finance and martingale modelling, more complex pro- 
cesses have been considered in order to take into ac- 
cou nt some empirical o bserv ations: jump processes (see 
e.g. iGont and Tankovl ( 2004 ) for a textbook t reatment) 
and st ochastic volatility (e.g. iHestonl ( 19931 ): iGatheraj 
( 20061 )) in particular. But recent events on financial 
markets and the succession of financial crashes (see e.g. 
iKindleberger and Alibeij (|2005f ) for a historical perspec- 
tive) should lead scientists to re-think basic concepts of 
modelling. This is where Econophysics is expected to 
come to play. During the past decades, the financial 
landscape has been dramatically changing: deregulation 
of markets, growing complexity of products. On a tech- 
nical point of view, the ever rising speed and decreasing 
costs of computational power and networks have lead to 
the emergence of huge databases that record all trans- 
actions and order book movements up to the millisec- 
ond. The availability of these data should lead to mod- 
els that are better empirically founded. Statistical facts 
and empirical models will be reviewed in this article and 
its companion paper. The recent turmoil on financial 
markets and the 2008 crash seem to plead for new mod- 
els and approaches. The Econophysics community thus 
has an important role to play in future financ ial market 
model li ng, as suggested by contrib utio ns fromlBouchaud 
(|2008[) . iLux and Westerhofj (|2009( ) or [Farmer and Folev 
20091) . 



C. A growing interdisciplinary field 

The chronological developmen t of Econophys ics has 
been well covered in the book of iRoehnej (|2002[ ). Here 
it is worth mentioning a few landmarks. The first ar- 
ticle on analysis of finance data whi c h app eared in a 
physics journal was that of iMantegnal (|l991l) . The first 
conference in Econophysics was held in Budapest in 1997 
and has been since followed by numerous schools, work- 
shops and the regular series of meetings: APFA (Appli- 
cation of Physics to Financial Analysis), WEHIA (Work- 
shop on Economic Heterogeneous Interacting Agents), 
and Econophys-Kolkata, amongst others. In the recent 
years the number of papers has increased dramatically; 
the community has grown rapidly and several new direc- 
tions of research have opened. By now renowned physics 
journals like the Reviews of Modern Physics, Physical 
Review Letters, Physical Review E, Physica A, Euro- 
physics Letters, European Physical Journal B, Interna- 
tional Journal of Modern Physics C, etc. publish papers 
in this interdisciplinary area. Economics and mathemat- 
ical finance journals, especially Quantitative Finance, re- 
ceive contributions from many physicists. The interested 
reader can also follow the developments quite well from 
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the preprint server (www.arxiv.org). In fact, recently a 
new section called quantitative finance has been added 
to it. One could also visit the web sites of the Econo- 
physics Forum (www.unifr.ch/econophysics) and Econo- 
physics.Org (ww w.econophys i cs.org ) . The first textbook 
in Econophysics (jSinha et al\ (|201Cl( )) is also in press. 



D. Organization of the review 

This article aims at reviewing recent empirical and the- 
oretical developments that use tools from Physics in the 
fields of Economics and Finance. In section |IT] of this 
paper, empirical studies revealing statistical properties 
of financial time series are reviewed. We present the 
widely acknowledged "stylized facts" describing the dis- 
tribution of the returns of financial assets. In section Hill 
we continue with the statistical properties observed on 
order books in financial markets. We reproduce most of 
the stated facts using our own high-frequency financial 
database. In the last part of this article (section lIVp. 
we review contributions on correlation on financial mar- 
kets, among which the computation of correlations using 
high-frequency data, analyses based on random matrix 
theory and the use of correlations to build economics 
taxonomies. In the companion paper to follow, Econo- 
physics models arc reviewed through the point of view 
of agent-based modelling. Using previous work origi- 
nally presented in the fields of behavioural finance and 
market microstructure theory, econophysicists have de- 
veloped agent-based models of order-driven markets that 
are extensively reviewed there. We then turn to models of 
wealth distribution where an agent-based approach also 
prevails. As mentioned above, Econophysics models help 
bringing a new look on some Economics observations, and 
advances based on kinetic theory models are presented. 
Finally, a detailed review of game theory models and the 
now classic minority games composes the final part. 



event on a stock is recorded with a timestamp defined up 
to the millisecond, leading to huge amounts of data. For 
example, as of today (2010), the Reuters Datascope Tick 
History (RDTH) database records roughly 25 gigabytes 
of data every trading day. 

Prior to this improvement in recording market activ- 
ity, statistics could be computed with daily data at best. 
Now scientists can compute intraday statistics in high- 
frequency. This allows to check known properties at new 
time scales (see e.g. section Hi Bl below), but also implies 
special care in the treatment (see e.g. the computation 
of correlation on high-frequency in section flV Al below) . 

It is a formidable task to make an exhaustive review 
on this topic but we try to give a flavour of some of the 
aspects in this section. 



A. "Stylized facts" of financial time series 

The concept of "stylized fa cts" was intro duced in 
macroeconomics around 1960 bv lKaldod (|l96lD . who ad- 
vocated that a scientist studying a phenomenon "should 
be free to start off with a stylized view of the facts" . In 
his work, Kaldor isolated several statistical facts char- 
acterizing macroeconomic growth over long periods and 
in several countries, and took these robust patterns as a 
starting point for theoretical modelling. 

This expression has thus been adopted to describe em- 
pirical facts that arose in statistical studies of financial 
time series and that seem to be persistent across various 
time periods, places, markets, assets, etc. One can find 
many different fists of t hese fac t s in s ev eral reviews (e.g. 
iBoUerslev etd] il994 : iPaganI (|l996l) : iGuillaume et al\ 
(|1997D : IContl(|200lh V We choose in this article to present 
a minimum set of facts now widely acknowledged, at least 
for the prices of equities. 



1. Fat-tailed empirical distribution of returns 



II. STATISTICS OF FINANCIAL TIME SERIES: PRICE, 
RETURNS, VOLUMES, VOLATILITY 

Recording a sequence of prices of commodities or as- 
sets produce what is called time series. Analysis of fi- 
nancial time scries has been of great interest not only 
to the practitioners (an empirical discipline) but also to 
the theoreticians for making inferences and predictions. 
The inherent uncertainty in the financial time series and 
its theory makes it speciall y interestin g to economists, 
statisticians and physicists ( Tsavl ( 2005f )). 

Different kinds of financial time series have been 
recorded and studied for decades, but the scale changed 
twenty years ago. The computerization of stock ex- 
changes that took place all over the world in the mid 
1980's and early 1990's has lead to the explosion of the 
amount of data recorded. Nowadays, all transactions on 
a financial market are recorded tick-by-tick., i.e. every 



Let pt be the price of a financial asset at time t. We 
define its return over a period of time r to be: 



p{t 



Pit) 



Pit) 



log{p{t + r)) - log(p(t)) (1) 



It has been larg ely observed - starting with iMandelbrotI 
( 19631 ). see e.g. iGopikrishnan et aZI ( 19991 ) for tests on 
more recent data - and it is the first stylized fact, that 
the empirical distributions of financial returns and log- 
returns are fat-tailed. On figure [1] we reproduce the em- 
pirical density function of normalized log-returns from 
IGo pikrishnan et al. ( 1999l) computed on the S&P500 in- 
dex. In addition, we plot similar distributions for unnor- 
malized returns on a liquid French stock (BNP Paribas) 
with r = 5 minutes. This graph is computed by sampling 
a set of tick-by-tick data from 9:05am till 5:20pm between 
January 1st, 2007 and May 30th, 2008, i.e. 356 days of 




FIG. 1. (Top) Empirical probability density function of the 
normalized 1-min ute SfcPSOO returns betwee n 1984 and 1996. 
Reproduced from iGopikrishnan et al] (| 19991 ). (Bottom) Em- 
pirical probability density function of BNP Paribas unnor- 
malized log-returns over a period of time r = 5 minutes. 



FIG. 2. Empirical cumulative distributions o f SfcP 500 dail y 
returns. (Top) Reproduced from IGopikrish nan et al\ (| 19991 ). 
in log-log scale. (Bottom) Computed using official daily close 
price between January 1st, 1950 and June 15th, 2009, i.e. 
14956 values, in linear-log scale. 



trading. Except where mentioned otherwise in captions, 
this data set will be used for all empirical graphs in this 
section. On figure [H cumulat i ve dis tribution in log-log 
scale from IGopikrishnan et all ( 19991 ) is reproduced. We 
also show the same distribution in linear-log scale com- 
puted on our data for a larger time scale t ~ I day, 
showing similar behaviour. 

Many studies obtain similar observations on different 
sets of data. For example, using t wo years of data on 
more than a thousand US stocks, IGopikrishnan et al\ 
(I1998D finds that the cumulative distribution of returns 
asymptotically follow a power law -F(r^) ^ \r\ " with 
a > 2 (a ss 2.8 - 3). With a > 2, the second mo- 
ment (the variance) is well-defined, excluding stable laws 
with infinite variance. There has been various sugges- 
tions for the form of the distribution: Student's-t, hyper- 
bolic, normal inverse Gaussian, exponentially truncated 
stable, and others, but no general consensus exists on 
the exact form of the tails. Although being the most 
widely acknowledged and the most elementary one, this 
stylized fact i s not e asil y met by all financial model ling . 
iGabaix et ail (|2006l ) or Iwvart and BouchaudI (|2007l ) re- 



call that efficient market theory have diffi culties in ex- 
plaining fat tails. ILux and" Sornettd (l2002l) have shown 
that models known as "rational expectation bubbles" , 
popular in economics, produced very fat-tailed distribu- 
tions (a < 1) that were in disagreement with the statis- 
tical evidence. 



Absence of autocorrelations of returns 



On figure |3l we plot the autocorrelation of log- returns 
defined as p{T) ~ {rrit + T)rr{t)) with r =1 minute 
and 5 mi nutes. We ob se rve here, as it is w idely known 
(see e.g. iPaganI (|l996[) : ICont et al\ (|l997t )). that there 
is no evidence of correlation between successive returns, 
which is the second "stylized-fact" . The autocorrelation 
function decays very rapidly to zero, even for a few lags 
of 1 minute. 



6 



BNPP.PA 1 -minute return — ^ 
BNPP. PA 5-minute return -x- 




10" 



10 20 30 40 50 60 70 80 90 
Lag 



.a 

s 



^ 10" 
;^ 

s 

S 
s 

I 



10"' 



10"- 



t = 1 day 
T = 1 week 
X = 1 month 
Gaussian 




12 3 4 

Normalized return 



FIG. 3. Autocorrelation function of BNPP.PA returns. 
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FIG. 4. Autocorrelation function of BNPP.PA absolute re- 
turns. 



3. Volatility clustering 

The third "stylizcd-fact" that we present here is of pri- 
mary importance. Absence of correlation between re- 
turns must no be mistaken for a property of indepen- 
dence and identical distribution: price fluctuations are 
not identically distributed and the properties of the dis- 
tribution change with time. 

In particular, absolute returns or squared returns ex- 
hibit a long-range slowly decaying auto correlation func- 
tion. This phenomena is widely k nown as "vo l atilit; 
clustering", and was formulated by iMandelbrotI ( 196 
as "large changes tend to be followed by large changes - 
of either sign - and small changes tend to be followed by 
small changes" . 

On figure IH the autocorrelation function of absolute 
returns is plotted for r = 1 minute and 5 minutes. The 
levels of autocorrelations at the first lags vary wildly with 
the parameter t. On our data, it is found to be maxi- 



FIG. 5. Distribution of log-returns of S&P 500 daily, weekly 
and monthly returns. Same data set as figure [2] bottom. 



mum (more than 70% at the first lag) for a returns sam- 
pled every five minutes. However, whatever the sampling 
frequency, autocorrelation is still above 10% after several 
hours of trading. On this data, we can grossly fit a power 
law decay with exponent 0.4. Other empirical tests re - 
port exponents b etween 0.1 and 0.3 |Cont et al\ ( 1997f ): 



iLiu et al\ (|l997t) : ICizeau et aZ] ([Til 



4. Aggregational normality 

It has been observed that as one increases the time 
scale over which the returns are calculated, the fat-tail 
property becomes less pronounced, and their distribu- 
tion approaches the Gaussian form, which is the fourth 
"stylized-f act" . This cross - over phenomenon is docu- 
mented in iKullmann et al. I (|l999t) where the evolution 
of the Pareto exponent of the distribution with the time 
scale is studied. On figure [51 we plot these standardized 
distributions for S&P 500 index between January 1st, 
1950 and June 15th, 2009. It is clear that the larger the 
time scale increases, the more Gaussian the distribution 
is. The fact that the shape of the distribution changes 
with T makes it clear that the random process underlying 
prices must have non-trivial temporal structure. 



B. Getting the right "time" 

1. Four ways to measure "time" 

In the previous section, all "stylized facts" have been 
presented in physical time, or calendar time, i.e. time 
series were indexed, as we expect them to be, in hours, 
minutes, seconds, milliseconds. Let us recall here that 
tick-by-tick data available on financial markets all over 
the world is time-stamped up to the millisecond, but the 
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order of magnitude of the guaranteed precision is much 
larger, usuaUy one second or a few hundreds of miUisec- 
onds. 

Calendar time is the time usually used to compute sta- 
tistical properties of financial time series. This means 
that computing these statistics involves sampling, which 
might be a delicate thing to do when dealing for example 
with several stocks with different liquidity. Therefore, 
three other ways to keep track of time may be used. 

Let us first introduce event time. Using this count, 
time is increased by one unit each time one order is sub- 
mitted to the observed market. This framework is nat- 
ural when dealing with the simulation of financial mar- 
kets, as it will be showed in the companion paper. The 
main outcome of event time is its "smoothing" of data. 
In event time, intraday seasonality (lunch break) or out- 
burst of activity consequent to some news are smoothed 
in the time series, since we always have one event per 
time unit. 

Now, when dealing with time series of prices, another 
count of time might be relevant, and we call it trade time 
or transaction time. Using this count, time is increased 
by one unit each time a transaction happens. The advan- 
tage of this count is that limit orders submitted far away 
in the order book, and may thus be of lesser importance 
with respect to the price series, do not increase the clock 
by one unit. 

Finally, going on with focusing on important events to 
increase the clock, we can use tick time. Using this count, 
time is increased by one unit each time the price changes. 
Thus consecutive market orders that progressively "eat" 
liquidity until the first best limit is removed in an order 
book are counted as one unit time. 

Let us finish by noting that with these definitions, 
when dealing with mid prices, or bid and ask prices, a 
time series in event time can easily be extracted from a 
time series in calendar time. Furthermore, one can al- 
ways extract a time series in trade time or in price time 
from a time series in event time. However, one cannot 
extract a series in price time from a series in trade time, 
as the latter ignores limit orders that are submitted in- 
side the spread; and thus change mid, bid or ask prices 
without any transaction taking place. 



2. Revisiting "stylized facts" with a new clock 

Now, using the right clock might be of primary impor- 
tance when dealing with statistical properties a nd esti- 
mators. For example. I Griffin and OomenI ( 20081 ) investi- 
gates the standard realized variance estimator (see sec - 
tion llVX)) in trade time and tick time. iMuni Tokel (|2010[ ) 
also recalls that the differences observed on a spread dis- 
tribution in trade time and physical time are meaning- 
ful. In this section we compute some statistics comple- 
mentary to the ones we have presented in the previous 
section Hi Al and show the role of the clock in the studied 
properties. 
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FIG. 6. Distribution of log-returns of stock BNPP.PA. This 
empirical distribution is computed using data from 2007, 
April 1st until 2008, May 31st. 



a. Aggregational normality in trade time We have 
seen above that when the sampling size increases, the dis- 
tribution of the log-returns tends to be more Gaussian. 
This property is much better seen using trade time. On 
figure [HI we plot the distributions of the log-returns for 
BNP Paribas stock using 2-month-long data in calendar 
time and trade time. Over this period, the average num- 
ber of trade per day is 8562, so that 17 trades (resp. 1049 
trades) corresponds to an average calendar time step of 
1 minute (resp. 1 hour). We observe that the distribu- 
tion of returns sampled every 1049 trades is much more 
Gaussian than the one sampled every 17 trades (aggre- 
gational normality), and that it is also more Gaussian 
that the one sampled every 1 hour (quicker convergence 
in trade time). 

Note that this propert y appears to be vahd in a mul- 
tidimensional setting, see iHuth and Abergell (|2009l ). 

b. Autocorrelation of trade signs in tick time It is 
well-known that the series of the signs of the trades on 
a given stock (usual convention: -1-1 for a transaction 
at the ask price, —1 for a transaction at the bid price) 
exhibit large autocorre lation. It has been observed in 
iLillo and Farmer] ( 2004[) for example that the autocorre- 
lation function of the signs of trades (e„) was a slowly 
decaying function in with a w 0.5. We compute 
this statistics for the trades on BNP Paribas stock from 
2007, January 1st until 2008, May 31st. We plot the re- 
sult in figure [T] We find that the first values for short 
lags are about 0.3, and that the log-log plot clearly shows 
some power-law decay with roughly a ~ 0.7. 

A very plausible explanation of this phenomenon re- 
lies on the execution strategies of some major brokers on 
a given markets. These brokers have large transaction 
to execute on the account of some clients. In order to 
avoid market making move because of an inconsiderably 
large order (see below section HlI Fl on market impact), 
they tend to split large orders into small ones. We think 
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FIG. 7. Auto-correlation of trade signs for stock BNPP.PA. FIG. 8. Second moment of the distribution of returns over A*" 

trades for the stock BNPP.PA. 



that these strategies explain, at least partly, the large 
autocorrelation observed. Using data on markets where 
orders are publicly identified and linked to a given bro- 
ker, it can be shown that the autocorrelation function 
of the order signs of a given broker, is even higher. See 
iBouchaud et al\ (I2009D for a review of these facts and 
some associated theories. 

We present here another evidence supporting this ex- 
planation. We compute the autocorrelation function of 
order signs in tick time, i.e. taking only into account 
transactions that make the price change. Results are 
plotted on figure [71 We find that the first values for short 
lags are about 0.10, which is much smaller than the val- 
ues observed with the previous time series. This supports 
the idea that many small transactions progressively "eat" 
the available liquidity at the best quotes. Note however 
that even in tick time, the correlation remains positive 
for large lags also. 
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FIG. 9. Average number of trades and average volatility on 
a time period r for the stock BNPP.PA. 



3. Correlation between volume and volatility 



Investigating time series of cotton prices, IClarkI (|1973[ ) 
noted that "trading volume and price change vari- 
ance seem to have a curvilinear relationship" . Trade 
time allows us to h ave a better view on this pr o perty : 
iPlerou et al. 1 (I2000D and ISilva and Yakovenkol (I2007D 
among others, show that the variance of log-returns after 
N trades, i.e. over a time period of TV in trade time, is 
proprtional to N . We confirm this observation by plot- 
ting the second moment of the distribution of log-returns 
after N trades as a function of N for our data, as well as 
the average number of trades and the average volatility 
on a given time interval. The results are shown on figure 
[HlandlSl 

This re sults are to be put i n relati on to the one pre- 
sented in iGopikrishnan et~. I (l2000bD . where the statis- 
tical properties of the number of shares traded Qa* for a 



given stock in a fixed time interval At is studied. They 
analyzed transaction data for the largest 1000 stocks 
for the two-year period 1994-95, using a database that 
recorded every transaction for all securities in three ma- 
jor US stock markets. They found that the distribution 
PiQAt) displayed a power-law decay as shown in Fig. 
1101 and that the time correlations in Qa* displayed long- 
range persistence. Further, they investigated the rela- 
tion between Qai and the number of transactions A'a* 
in a time interval At, and found that the long-range 
correlations in Qa* were largely due to those of TVa*. 
Their results are consistent with the interpretation that 
the large equal-time correlation previously found between 
Qai and the absolute value of price change \GAt \ (related 
to volatility) were largely due to N/^t- 

Therefore, studying variance of price changer in trade 
time suggests that the number of trade is a good proxy 
for the unobserved volatility. 
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to a symme tric stable distribution (see iFelleil ( 19681 ) V 
ICIarkI ( 1973[ ) tests empirically a log- normal subordina- 
tion with time series of prices o f cotton. In a similar 
way, ISilva and Yakovenk 

d (I2007D find that an exponen- 
tial subordination with a kernel: 



KJN) = — e~ 

TjT 



(5) 



is in good agreement with empirical data. If the orders 
were submitted to the market in a independent way and 
at a constant rate rj, then the distribution of the number 
of trade per time period r should be a Poisson process 
with intensity yyr. Therefore, the empirical fit of equa- 
tion ([5]) is inconsistent with such a simplistic hypothesis 
of distribution of time of arrivals of orders. We will sug- 
gest in the next section some possible distributions that 
fit our empirical data. 



III. STATISTICS OF ORDER BOOKS 



FIG. 10. Dist ribution of the number of shares traded Qa*. 
Adapted from lGopikrishnan et al\ (|2000bt ). 



4. A link with stochastic processes: subordination 

These empirical facts (aggregational normality in trade 
time, relationship between volume and volatility) rein- 
force the interest for models based on the subordination 
of stochastic process es, wh i ch had been introduced in fi- 
nancial modeling by IClarlj ( 19731 ). 

Let us introduce it here. Assuming the proportionality 
between the variance (x)^ of the centred returns x and 
the number of trades Nt over a time period r, we can 
write: 



aNr- 



(2) 



Therefore, assuming the normality in trade time, we can 
write the density function of log-returns after N trades 

as 



Sn{x) 



y/2TTaN ' 



(3) 



Finally, denoting Kr{N) the probability density function 
of having N trades in a time period r, the distribution 
of log returns in calendar time can be written 



Prix) 



e2c.iv 
V^noN 



Kr{N)dN. 



(4) 



This is the subordination of the Gaussian process xn 
using the number of trades Nr as the directing process, 
i.e. as the new clock. With this kind of modelization, 
it is expected, since P/v is gaussian, the observed non- 
gaussian behavior will come from Kr{N). For exam- 
ple, some specific choice of directing processes may lead 



The computerization of financial markets in the sec- 
ond half of the 1980's provided the empirical scientists 
with easier acces s to extensive data on order books. 
iBiais et all ( 1995t ) is an early study of the new data 



flows on the newly (at that time) computerized Paris 
Bourse. Variables crucial to a fine modeling of order 
flows and dynamics of order books arc studied: time 
of arrival of orders, placement of orders, size of orders, 
shape of order book, etc. Many subsequent papers of- 



iGoDikrishnan et all (l2000all). 


Challet and Stinchcombd 


(2001). Maslov and MiUsI (12001 


). Bouchaud et a/.l(2002D. 



)03f ). Before going further in 
our review of available models, we try to summarize some 
of these empirical facts. 

For each of the enumerated properties, we present new 
empirical plots. We use Reuters tick-by-tick data on the 
Paris Bourse. We select four stocks: France Telecom 
(FTE.PA) , BNP Paribas (BNPP.PA), Societe Generale 
(SOGN.PA) and Renault (RENA.PA). For any given 
stocks, the data displays time-stamps, traded quantities, 
traded prices, the first five best-bid limits and the first 
five best-ask limits. From now on, we will denote ai{t) 
(resp. {bj{t)) the price of the i-th limit at ask (resp. j- 
th limit at bid). Except when mentioned otherwise, all 
statistics are computed using all trading days from Oct, 
1st 2007 to May, 30th 2008, i.e. 168 trading days. On a 
given day, orders submitted between 9:05am and 5:20pm 
arc taken into account, i.e. first and last minutes of each 
trading days are removed. 

Note that we do not deal in this section with the cor- 
relations of the signs of trades, since statistical results on 
this fact have already been treated in section Hi B 21 Note 
also that although most of these facts are widely acknowl- 
edged, we will not describe them as new "stylized facts 
for order books" since their ranges of validity are still 
to be checked among various products/stocks, markets 
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and epochs, and strong properties need to be properly 
extracted and formalized from these observations. How- 
ever, we will keep them in mind as we go through the 
new trend of "empirical modeling" of order books. 

Finally, let us recall that the markets we are dealing 
with are electronic order books with no official market 
maker, in which orders are submitted in a double auc- 
tion and executions follow price/time priority. This type 
of exchange is now adopted nearly all over the world, 
but this was not obvious as long as computerization was 
not complete. Different market mechanisms have been 
widely s t udied i n the r nicros tr ucture literatur e, see e.g. 
iGarmanI (I1976D: iKvld (fl985h: iGlost cn' (1994; lO'Haral 
(|l997f ): iBiais et a/.l (|1997^ : basbrouck (.2007f) . We w ill 
not review this literature here (except GarmanI ( 1976[) in 
our companion paper), as this would be too large a di- 
gression. However, such a literature is linked in many 
aspects to the problems reviewed in this paper. 



BNPP.PA 
Lognormal 
Exponential 
Weibull 




Interarrival time 



FIG. 11. Distribution of interarrival times for stock BNPP.PA 
in log-scale. 



A. Time of arrivals of orders 

As explained in the previous section, the choice of the 
time count might be of prime importance when dealing 
with "stylized facts" of empirical financial time scries. 
When reviewi ng the subordination of stochast ic processes 
(iGlarkI (Il973h : ISilva and Yakovenkol (|2007h l. we have 
seen that the Poisson hypothesis for the arrival times 
of orders is not empirically verified. 

We compute the empirical distribution for interarrival 
times - or durations - of market orders on the stock BNP 
Paribas using our data set described in the previous sec- 
tion. The results arc plotted in figures [TT] and [T2l both in 
linear and log scale. It is clearly observed that the expo- 
nential fit is not a good one. We check however that the 
Weibull distribution fit is potentially a very good one. 
Weibull distr i butio n s have been s uggested for example in 
llvanov et aZI (|2004 . iPoliti and S calas (2008) also obtain 
good fits with (/-exponential distributions. 

In the Econometrics literature, these observations 
of non-Poissonian arrival times have given rise to a 
large trend of m o dellin g of i rregula r finan cial data. 
lEngle and Russelll ([l993) and lEngld (|2000[ ) have in- 
troduced autoregressive condition duration or intensity 
models that may help modelling these processes of or- 
ders' submission. See iHautschi (|2004D for a textbook 
treatment. 

Using the same data, we compute the empirical dis- 
tribution of the number of transactions in a given time 
period r. Results are plotted in figure [T31 It seems that 
the log-normal and the gamma distributions are both 
good candidates, however none of them really describes 
the empirical result, suggesting a complex structure of 
arrival of or ders. A similar result on Rus sian stocks was 
presented in lDremin and Leonidovl (|2005[ ). 
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FIG. 12. Distribution of interarrival times for stock BNPP.PA 
(Main body, linear scale). 



B. Volume of orders 

Empirical studies show that the unconditional dis- 
tribu tion of order size i s very comp lex to character 
ize " 



Gopikrishnan et al\ (|200o3) and iMaslov and Millsl 
(j2001l ) observe a power law decay with an exponent 
1 + fjLK, 2.3 — 2.7 for market orders and l+_u « 2.0 for 



limit orders. IChallet and Stinchcombj ( 200ll ) empha- 
size on a clustering property: orders tend to have a 
"round" size in packages of shares, and clusters are ob- 
served around lOO's and lOOO's. As of today, no consen- 
sus emerges in proposed models, and it is plausible that 
such a distribution varies very wildly with products and 
markets. 

In figure [HI we plot the distribution of volume of mar- 
ket orders for the four stocks composing our benchmark. 
Quantities are normalized by their mean. Power-law co- 
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Normalized number of trades Normalized volume of limit orders 



FIG. 13. Distribution of the number of trades in a given time 
period t for stock BNPP.PA. This empirical distribution is 
computed using data from 2007, October 1st until 2008, May 
31st. 



efficie nt is estimated by a H ill estimator (see e.g. iHilll 
(|1975D ; Ide Haan et pOOd ) ) . Wc find a power law with 
exponent 1 + /i « 2.7 which confirms studies previously 
cited. Figure [15] displays the same distribution for limit 
orders (of all available limits). We find an average value 
of 1+ jiK. 2.1, consistent with previous studies. How- 
ever, we note that the power law is a poorer fit in the 
case of limit orders: data normalized by their mean col- 
lapse badly on a single curve, and computed coefficients 
vary with stocks. 
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FIG. 14. Distribution of volumes of market orders. Quantities 
are normalized by their mean. 



C. Placement of orders 



a. Placement of arriving limit orders 



FIG. 15. Distribution of normalized volumes of limit orders. 
Quantities are normalized by their mean. 



iBouchaud eTai] (|2002l ) observe a broad power-law 
placement aro und the best quotes o n Fren ch stocks, 
confirmed in iPotters and BouchaudI (|2003[ ) on US 
stocks. Observed exponents are quite stable across 
stocks, but exchange dependent: 1 -I- /i « 1.6 on the 
Paris Bourse, 1 -I- ^ « 2.0 on the New York Stock 
Exchange, 1 -I- m ^ 2.5 o n the London Stock Exchange. 



iMike and Farmer! ( 20081 ) propose to fit the empirical 
distribution with a Student distribution with 1.3 degree 
of freedom. 



of the following quantity 
i.e. using only the first 

Ap = 5o(t-) - h{t) (resp. 

ask) order arrives at price 
) (rcsp.ao(t— )) is the best 



We plot the distribution 
computed on our data set, 
five limits of the order book 
a{t) — ao{t—)) if an bid (resp 
b{t) (resp. a{t)), where 6o(^- 
bid (resp. ask) before the arrival of this order. Results 
are plotted on figures (TH] (in semilog scale) and [T7] (in 
linear scale). These graphs being computed with in- 
complete data (five best limits), we do not ob serve a 
placement as broad as in iBouchaud erd] (|2002[ ). How- 
ever, our data makes it clear that fat tails are observed. 
Wc also observe an asymmetry in the empirical distribu- 
tion: the left side is less broad than the right side. Since 
the left side represent limit orders submitted inside the 
spread, this is expected. Thus, the empirical distribution 
of the placement of arriving limit orders is maximum at 
zero (same best quote). We then ask the question: How 
is it translated in terms of shape of the order book ? 

b. Average shape of the order book Contrary to what 
one might expect, it seems that the maximum of the av- 
erage offered volume in an or der book is loca ted away 
from the best quotes (see e.g. IBouchaud et at, (2002.) ). 
Our data confirms this observation: the average quantity 
offered on the five best quotes grows with the level. This 
result is presented in figure [TSl We also compute the av- 
erage price of these levels in order to p lot a cross-sectiona l 
graph similar to the ones presented in lBiais et al. I (119951 ). 
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Ap Level of limit orders (<0:bids ; >0:asks) 



FIG. 16. Placement of limit orders using the same best quote 
reference in semilog scale. Data used for this computation 
is BNP Paribas order book from September 1st, 2007, until 
May 31st, 2008. 
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FIG. 18. Average quantity ofTered in the limit order book. 
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FIG. 19. Average limit order book: price and depth. 



D. Cancelation of orders 



FIG. 17. Placement of limit orders using the same best quote 
reference in linear scale. Data used for this computation is 
BNP Paribas order book from September 1st, 2007, until May 
31st, 2008. 



Our result is presented for stock BNP.PA in figure [T^ and 
displays the expected shape. Results for other stocks are 
similar. We find that the average gap between two levels 
is constant among the five best bids and asks (less than 
one tick for FTE.PA, 1.5 tick for BNPP.PA, 2.0 ticks for 
SOGN.PA, 2.5 ticks for RENA.PA). We also find that 
the average spread is roughly twice as large the aver- 
age gap (factor 1.5 for FTE.PA, 2 for BNPP.PA, 2.2 for 
SOGN.PA, 2.4 for RENA.PA). 



IChallet and Stinchcomb^ (|200l[ ) show that the dis- 
tribution of the average lifetime of limit orders fits 
a power law with exponent 1 -|- /i ss 2.1 for cancelled 
limit orders, and 1 + M ~ 1-5 for executed limit orders. 
iMike and Farmeil (|200^ find that in either case the ex- 
ponential hypothesis (Poisson process) is not satisfied on 
the market. 

Wc compute the average lifetime of cancelled and exe- 
cuted orders on our dataset. Since our data does not in- 
clude a unique identifier of a given order, we reconstruct 
life time orders as follows: each time a cancellation is 
detected, we go back through the history of limit order 
submission and look for a matching order with same price 
and same quantity. If an order is not matched, we discard 
the cancellation from our lifetime data. Results are pre- 
sented in figure [2ni and [5TJ We observe a power law decay 
with coefficients 1 + /i ss 1.3 — 1.6 for both cancelled and 
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executed limit orders, with little variations among stocks. 
These results are a bit different than the ones presented 
in previous studies: similar for executed limit orders, but 
our data exhibits a lower decay as for cancelled orders. 
Note that the observed cut-off in the distribution for life- 
times above 20000 seconds is due to the fact that we do 
not take into account execution or cancellation of orders 
submitted on a previous day. 
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FIG. 20. Distribution of estimated lifetime of cancelled limit 
orders. 



the observed market activity is larger at the beginning 
and the end of the day, and more quiet a round mid- 
day. S uch a U-shaped curve is well-known, see lBiais et al\ 
( 19951 ). for example. On our data, we observe that the 
number of orders on a 5-minute interval can vary with a 
factor 10 throughout the day. 
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FIG. 22. Normalized average number of market orders in a 
5-minute interval. 



Power law « x" 

BNPP.PA 
FTE.PA 
RENA.PA 
SOGN.PA 




20 400 8000 

Lifetime for executed limit orders 




BNPP.PA 
BNPP.PA quadratic fit - 
FTE.PA 
FTE.PA quadratic fit 



35000 40000 45000 50000 

Time of day (seconds) 



55000 



60000 



FIG. 21. Distribution of estimated lifetime of executed limit 
orders. 



FIG. 23. Normalized average number of limit orders in a 5- 
minute interval. 



E. Intraday seasonality 

Activity on financial markets is of course not constant 
throughout the day. Figure [22] (resp. [23]) plots the (nor- 
malized) number of market (resp. limit) orders arriving 
in a 5-minute interval. It is clear that a U-shape is ob- 
served (an ordinary least-square quadratic fit is plotted): 



IChallet and Stinchcoi^ ([2001] ) note that the average 
number of orders submitted to the market in a period 
AT vary wildly during the day. The authors also observe 
that these quantities for market orders and limit orders 
arc highly correlated. Such a type of intraday variation 
of the globa l market act i vity is a well-known fact, already 
observed in lBiais et all ( 19951 ). for example. 
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F. Market impact 



A. Estimating covariance on high-frequency data 



The statistics we have presented may help to under- 
stand a phenomenon of primary importance for any fi- 
nancial market practitioner: the market impact, i.e. the 
relationship between the volume traded and the expected 
price shift once the order has been executed. On a first 
approximation, one understands that it is closely linked 
with many items described above: the volume of mar- 
ket orders submitted, the shape of the order book (how 
much pending limit orders are hit by one large market 
orders), the correlation of trade signs (one may assume 
that large orders are splitted in order to avoid a large 
market impact), etc. 

Many empirical studies are available. An empirical 
study on the price impact of individual transac tions on 
1000 stocks on the NYSE is conducted in iLillo et al\ 
(I2003D . It is found that proper rescaling make all the 
curve collapse onto a single concave master curve. This 
function increases as a power that is the order of 1 /2 for 
small volumes, but then increases more slowly for large 
volumes. They obtain similar results in each year for the 
period 1995 to 1998. 

We will not review any further the large literature of 
market impact, but rather refer the reader to the recen t 
exhaustive synthesis proposed in lBouchaud et cd. I (|2009D . 
where different types of impacts, as well as some theoret- 
ical models are discussed. 



IV. CORRELATIONS OF ASSETS 



Let us assume that we observe d time series of 
prices or log-prices pi,i = 1, . . . ,d, observed at times 
tm, "T- = 0, . . . , M. The usual estimator of the covari- 
ance of prices i and j is the realized covariance estimator, 
which is computed as: 

M 
m— 1 

(6) 

The problem is that high-frequency tick-by-tick data 
record changes of prices when they happen, i.e. at ran- 
dom times. Tick-by-tick data is thus asynchronous, con- 
trary to daily close prices for example, that are recorded 
at the same time for all the assets on a given exchange. 
Using standard estimators without caution, cou ld be 
one ca use for the "Epps effect" , first observed in IEppsI 
(I1979D . which stated that "[c]orrelations among price 
changes in common stocks of companies in one indus- 
try are found to decrease with the length of the interval 
for which the price changes are measured." This has 
la rgely been ve r ified since, e.g. in Bonanno et al. 1 (120011) 
or lRenol (|2003[ ). iHavashi and Yoshidal (|2005[ ) shows that 
non-synchronicity of tick- by-tick data and necessary sam- 
pling of time series in order to compute the usual realized 
covariance estimator partially explain this phenomenon. 
We very briefly review here two covariance estimators 
that do not need any synchronicity (hence, sampling) in 
order to be computed. 



The word "correlation" is defined as "a relation exist- 
ing between phenomena or things or between mathemat- 
ical or statistical variables which tend to vary, be associ- 
ated, or occur together in a way not expected on the basis 
of chance alone'o When we talk about correlations in 
stock prices, what we are really interested in are relations 
between variables such as stock prices, order signs, trans- 
action volumes, etc. and more importantly how these 
relations affect the nature of the statistical distributions 
and laws which govern the price time series. This sec- 
tion deals with several topics concerning linear correla- 
tion observed in financial data. The first part deals with 
the important issue of computing correlations in high- 
frequency. As mentioned earlier, the computerization of 
financial exchanges has lead to the availability of huge 
amount of tick-by-tick data, and computing correlation 
using these intraday data raises lots of issues concern- 
ing usual estimators. The second and third parts deals 
with the use of correlation in order to cluster assets with 
potential applications in risk management problems. 



The Fourier estimator 



The Fourier estimator h as been introduced by 
iMalliavin and Mancinol ( 20021 ). Let us assume that we 
have d time series of log-prices that are observations of 
Brownian semi-martingales pi : 



K 



dpi = y. o'ijdWj + fJ-idt, i = 1, . . . ,d. (7) 



The coefficient of the covarianc e matrix are then writ- 
ten T, ii{t) = ^k=i'^ik{t)(Jjk{t)- IMalliavin and Mancinol 
( 20021 ) show that the Fourier coefficient of (t) are, with 
riQ a given integer: 



flfc 



TT ^1 
(S,,) ^ \im^ N + l-no ^ 2 i''s{dp^)a,+k{dpj) 



1M) 

s=no 

-bs+k{dpi)bs{dpj)\ , 



(8) 



^ In Merriam- Webster Online Dictionary. Retrieved June 14, 2010, 
from http:/ /www. merriam-webster.com/dictionary/correlations 



&fe(I]y) = lim 



w->oo N + l-nn ^ 2 

s=rio 

-&s(dpj)as+fc(dpj)] , 



{as{dp^)hs^-k{dpj 



(9) 
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where the Fourier coefficients ak{dpi) and hk{dpi) of dpi 
can be directly computed on the time series. Indeed, 
rescahng the time window on [0, 27r] and using integration 
by parts, we have: 



ttkidpi 



p{27T)-p{0) 



sm{kt)p,{t)dt. (10) 



This last integral can be discretized and approximately 
computed using the times of observations of the pro- 
cess Pi. Therefore, fixing a sufficiently large N, one 
can compute an estim ator 'S fj of th e c ovariance of the 
proces ses i and j. See iRenol (I2003D or llori and Precud 
( 2007t ) , for examples of empirical studies using this esti- 
mator. 



2. The Hayashi-Yoshida estimator 



iHavashi and Yoshidal (|2005[ ) have proposed a simple 
estimator in order to compute covariance/correlation 
without any need for synchronicity of time series. As 
in the Fourier estimator, it is assumed that the observed 
process is a Brownian semi-martingale. The time win- 
dow of observation is easily partitioned into d family of 
intervals IF = iU'^),i = I,. . . ,d, where tl^ = inf{t/*„^_j} 
is the time of the m-th observation of the process i. Let 
us denote Api{U^) = Pi{t\n) — Piifm-i)- The cumula- 
tive covariance estimator as the authors named it, or the 
Hayashi- Yoshida estimator as it has been largely refered 
to, is then built as follows: 



^Ap,([/;jAp,([/,i)i^ 



(11) 



There is a large literature in Econometrics that 
tackles the new challenges posed by high-frequency 
data. We refer the reader, wishing to go be- 
yond this brief presentation, to the econo n ietrics re- 
views by Ba rndorfF-Nielsen a nd ShephardI ( 20071 ) or 
iMcAleer and Medeirosi (,2008. ). for example. 



B. Correlation matrix and Random Matrix Theory 

The stock market data being essentially a multivariate 
time series data, we construct correlation matrix to study 
its spectra and contrast it with the random multivariate 
data from coupled map lattice. It is known from previous 
studies that the empirical spectra of correlation matrices 
drawn from time series data, for m ost p art, foll ow ran- 
dom m atrix theory (RMT, see e.g. iGopikrislman et al\ 
(|2001[ )1. 



1. Correlation matrix and Eigenvalue density 

a. Correlation matrix If there are N assets with 
price Pi(t) for asset i at time t, then the logarithmic re- 
turn of stock i is ri{t) = \nPi{t) — \n Pi{t — 1) , which for 



a certain consecutive sequence of trading days forms the 
return vector . In order to characterize the synchronous 
time evolution of stocks, the equal time correlation coef- 
ficients between stocks i and j is defined as 



inrj) - (r,)(rj) 



(12) 



where (...) indicates a time average over the trading days 
included in the return vectors. These correlation coef- 
ficients form an X matrix with — 1 < pij < 1. If 
Pij = 1, the stock price changes arc completely corre- 
lated; if Pij ~ 0, the stock price changes are uncorre- 
lated, and if pij = — 1, then the stock price changes are 
completely anti-correlated. 

b. Correlation matrix of spatio-temporal series from 
coupled map lattices Consider a time series of the form 
z'{x,t), where x ~ l,2,...n and t = l,2....p denote the 
discrete space and time, respectively. In this, the time 
series at every spatial point is treated as a different vari- 
able. We define the normalised variable as 



z{x, t) 



z'{x,t) - {z'{x)) 
cr(x) 



(13) 



where the brackets (.) represent temporal averages and 
<7{x) the standard deviation of z' at position x. Then, 
the equal-time cross-correlation matrix that represents 
the spatial correlations can be written as 



Sx,: 



{z{x,t) z{x' ,t)) , x,x' = l,2,. 



(14) 



The correlation matrix is symmetric by construction. In 
addition, a large class of processes are translation invari- 
ant and the correlation matrix can contain that addi- 
tional symmetry too. We will use this property for our 
correlation models in the context of coupled map lat- 
tice. In time series analysis, the averages (.) have to 
be replaced by estimates obtained from finite samples. 
As usual, we will use the maximum likelihood estimates, 
{a{t)) ~ These estimates contain statisti- 

cal uncertainties, which disappears for p ^ oo. Ideally, 
one requires p ^ n to have reasonab l y corr ect correla- 
tion estimates. See IChakraborti et al\ (|2007D for details 
of parameters. 

c. Eigenvalue Density The interpretation of the 
spectra of empirical correlation matrices should be done 
carefully if one wants to be able to distinguish between 
system specific signatures and universal features. The 
former express themselves in the smoothed level den- 
sity, whereas the latter usually are represented by the 
fiuctuations on top of this smooth curve. In time series 
analysis, the matrix elements are not only prone to un- 
certainty such as measurement noise on the time series 
data, but also statistical fluctuations due to finite sam- 
ple effects. When characterizing time series data in terms 
of random matrix theory, one is not interested in these 
trivial sources of fluctuations which are present on every 
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data set, but one would like to identify the significant fea- 
tures which would be shared, in principle, by an "infinite" 
amount of data without measurement noise. The eigen- 
functions of the correlation matrices constructed from 
such empirical time series carry the information con- 
tained in the original time series data in a "graded" man- 
ner and they also provide a compact representation for it. 
Thus, by applying an approach based on random matrix 
theory, one tries to identify non-random components of 
the correlation matrix spec tra as deviations from rand om 
matrix theory predictions ([Gopikrishnan et al\ ( 200l[ )). 

We will look at the eigenvalue density that has been 
studied in the context of applying random matrix the- 
ory methods to time series correlations. Let J\f{X) be the 
integrated eigenvalue density which gives the number of 
eigenvalues less than a given value A. Then, the eigen- 
value or level density is given by p{X) 



70 r 



dj\f{\) 



This can 

be obtained assuming random correlation matrix and is 
found to be in good agreement with the empirical time se- 
ries data from stock market fluctuations. From Random 
Matrix Theory considerations, the eigenvalue density for 
random correlations is given by 



Q 



ZttA 



IX ^ X){X — Xmin) , 



(15) 



where Q = N/T is the ratio of the number of variables 
to the length of each time series. Here, Xmax and Xmin, 
representing the maximum and minimum eigenvalues of 
the random correlation matrix respectively, are given by 
Xmax.min = 1 + I / Q ±2^/T/Q . Howcvcr, duc to prcscuce 
of correlations in the empirical correlation matrix, this 
eigenvalue density is often violated for a certain number 
of dominant eigenvalues. They often correspond to sys- 
tem specific information in the data. In Fig. [24] we show 
the eigenvalue density for S&P500 data and also for the 
chaotic data from coupled map lattice. Clearly, both 
curves are qualitatively different. Thus, presence or ab- 
sence of correlations in data is manifest in the spectrum 
of the corresponding correlation matrices. 



2. Earlier estimates and studies using Random Matrix 
Theory 



iLaloux et al\ ( 1999f ) showed that results from the ran- 
dom matrix theory were useful to understand the statis- 
tical structure of the empirical correlation matrices ap- 
pearing in the study of price fluctuations. The empirical 
determination of a correlation matrix is a difficult task. 
If one considers N assets, the correlation matrix con- 
tains N{N l)/2 mathematically independent elements, 
which must be determined from N time series of length 
T. If T is not very large compared to N, then gener- 
ally the determination of the covariances is noisy, and 
therefore the empirical correlation matrix is to a large 
extent random. The smallest eigenvalues of the matrix 
are the most sensitive to this 'noise'. But the eigenvec- 
tors corresponding to these smallest eigenvalues deter- 
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FIG. 24. The upper panel shows spectral density for multi- 
variate spatio-temporal time series drawn from coupled map 
lattices. The lower panel shows the eigenvalue density for the 
return time series of the S&P500 stock market data (8938 
time steps). 



mine the minimum risk portfolios in Markowitz theory. 
It is thus important to distinguish "signal" from "noise" 
or, in other words, to extract the eigenvectors and eigen- 
values of the correlation matrix containing real informa- 
tion (those important for risk control) , from those which 
do not contain any useful information and are unstable in 
time. It is useful to compare the properties of an empiri- 
cal correlation matrix to a "null hypothesis" — a random 
matrix which arises for example from a finite time se- 
ries of strictly uncorrclated assets. Deviations from the 
random matrix case might then suggest the presence of 
true information. The main result of their study was the 
remarkable agreement between the theoretical prediction 
(based on the assumption that the correlation matrix is 
random) and empirical data concerning the density of 
eigenvalues (shown in Fig. [?5|) associated to the time 
series of the different stocks of the S&P 500 (or other 
stock markets). Cross-correla t ions in finan cial data were 
also studied bv iPlerou et al\ (|l999l |2002|) . They anal- 
ysed cross-correlations between price fluctuations of dif- 
ferent stocks using methods of RMT. Using two large 
databases, they calculated cross-correlation matrices of 
returns constructed from (i) 30-min returns of 1000 US 
stocks for the 2-yr period 1994-95, (ii) 30-min returns 
of 881 US stocks for the 2-yr period 1996-97, and (in) 
1-day returns of 422 US stocks for the 35-yr period 1962- 
96. They also tested the statistics of the eigenvalues 
Xi of cross-correlation matrices against a "null hypoth- 
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FIG. 25. Eig envalue spect r um o f the correlation matrices. 
Adapted from iLaloux et al\ (|1999| ). 



esis". They found that a majority of the eigenvalues 
of the cross-correlation matrices were within the RMT 
bounds [Xmim ^max], 8iS defined above, for the eigenval- 
ues of random correlation matrices. They also tested the 
eigenvalues of the cross-correlation matrices within the 
RMT bounds for universal properties of random matrices 
and found good agreement with the results for the Gaus- 
sian orthogonal ensemble (GOE) of random matrices — 
implying a large degree of randomness in the measured 
cross-correlation coefficients. Furthermore, they found 
that the distribution of eigenvector components for the 
eigenvectors corresponding to the eigenvalues outside the 
RMT bounds displayed systematic deviations from the 
RMT prediction and that these "deviating eigenvectors" 
were stable in time. They analysed the components of the 
deviating eigenvectors and found that the largest eigen- 
value corresponded to an influence common to all stocks. 
Their analysis of the remaining deviating eigenvectors 
showed distinct groups, whose identities corresponded to 
conventionally-identified business sectors. 



C. Analyses of correlations and economic taxonomy 



sian and truncated Levy distributions, they found that 
due to the correlations in the variance, the process "dy- 
namically" generated power-law tails in the distributions, 
whose exponents could be controlled through the way the 
correlations in the variance were introduced. For a trun- 
cated Levy distribution, the process could extend a trun- 
cated distribution beyond the truncation cutoff, leading 
to a crossover between a Levy stable power law and their 
"dynamically-generated" power law. It was also shown 
that the process could explain the crossover behavior ob- 
served i n the S&P 500 stock index. 

iNohl (|2000[ ) proposed a model for correlations in stock 
markets in which the markets were composed of several 
groups, within which the stock price fluctuations were 
correlated. T he spectral properti es of empirical correla - 
tion matrices ( Plerou et al\ ( 19991 ): iLaloux et al\ ( 19991 )) 
were studied in relation to this model and the connection 
between the spectral properties of the empirical corre- 
lation matrix and the structure of correlations in stock 
markets was established. 

The correl ation structure of e xtreme stock returns were 
studied bv iCizeau et al\ ( 200l[ ). It has been commonly 
believed that the correlations between stock returns in- 
creased in high volatility periods. They investigated how 
much of these correlations could be explained within a 
simple non-Gaussian one-factor description with time in- 
dependent correlations. Using surrogate data with the 
true market return as the dominant factor, it was shown 
that most of these correlations, measured by a variety of 
different indicators, could be accounted for. In partic- 
ular, their one-factor model could explain the level and 
asymmetry of empirical exceeding correlations. However, 
more subtle effects required an extension of the one factor 
model, where the variance and skewness of the residuals 
al so depended o n the market return. 

iBurda et al. I (|200lh provided a statistical analysis of 
three S&P 500 covariances with evidence for raw tail 
distributions. They studied the stability of these tails 
against reshuffling for the S&P 500 data and showed that 
the covariance with the strongest tails was robust, with 
a spectral density in remarkable agreement with random 
Levy matrix theory. They also studied the inverse par- 
ticipation ratio for the three covariances. The strong 
localization observed at both ends of the spectral den- 
sity was analogous to the localization exhibited in the 
random Levy matrix ensemble. They showed that the 
stocks with the largest scattering were the least suscepti- 
ble to correlations and were the likely candidates for the 
localized states. 



1. Models and theoretical studies of financial correlations 2. Analyses using graph theory and economic taxonomy 



iPodobnik et al\ (|2000l ) studied how the presence of cor- 
relations in physical variables contributes to the form of 
probability distributions. They investigated a process 
with correlations in the variance generated by a Gaus- 
sian or a truncated Levy distribution. For both Gaus- 



iMantegnal (|l999t) introduced a method for finding a hi- 
erarchical arrangement of stocks traded in financial mar- 
ket, through studying the clustering of companies by us- 
ing correlations of asset returns. With an appropriate 
metric - based on the earlier explained correlation ma- 
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trix coefficients p^j's between all pairs of stocks i and 
j of the portfolio, computed in Eq. [T^] by considering 
the synchronous time evolution of the difference of the 
logarithm of daily stock price - a fully connected graph 
was defined in which the nodes are companies, or stocks, 
and the "distances" between them were obtained from 
the corresponding correlation coefficients. The minimum 
spanning tree (MST) was generated from the graph by 
selecting the most important correlations and it was used 
to identify clusters of companies. The hierarchical tree 
of the sub-dominant ultrametric space associated with 
the graph provided information useful to investigate the 
number and nature of the common economic factors af- 
fecting the time evolution of logarithm of price of well 
defined groups of stocks. Several other attempts have 
been made to obtain clustering from the huge correlation 
matrix. 

iBonanno et all ( 200ll ) studied the high-frequency 
cross-correlation existing between pairs of stocks traded 
in a financial market in a set of 100 stocks traded in US 
equity markets. A hierarchical organization of the inves- 
tigated stocks was obtained by determining a metric dis- 
tance between stocks and by investigating the properties 
of the sub-dominant ultrametric associated with it. A 
clear modification of the hierarchical organization of the 
set of stocks investigated was detected when the time 
horizon used to determine stock returns was changed. 
The hierarchical location of stocks of the energy sector 
was investigated as a function of the time horizon. The 
hierarchical structure explored by the minimum spanning 
tree also seemed to give information about the influential 
power of the companies. 

It also turned out that the hierarchical structure of 
the financial market could be identified in accordance 
with the results obtained by an independent cluster- 
ing method, based o n Potts super-paramag netic transi- 
tions as studied by iKullmann et al\ ( 200Gf ). where the 



spins correspond to companies and the interactions are 
functions of the correlation coefficients determined from 
the time dependence of the companies' individual stock 
prices. The metho d is a gene r alizat ion of the clus- 
tering algorithm by iBlatt et all ( 1996f ) to the case of 
anti-ferromagnetic interactions corresponding to anti- 
correlations. For the Dow Jones Industrial Average, no 
anti-correlations were observed in the investigated time 
period and the previous results obtained by different tools 
were well reproduced. For the S&P 500, where anti- 
correlations occur, repulsion between stocks modified the 
cluster structure of the N = 443 companies studied, as 
shown in Fig. [551 The efficiency of the method is repre- 
sented by the fact that the figure matches well with the 
corresponding result obtained by the minimal spanning 
tree method, including the specific composition of the 
clusters. For example, at the lowest level of the hierarchy 
(highest temperature in the super-paramagnetic phase) 
the different industrial branches can be clearly identi- 
fied: Oil, electricity, gold mining, etc. companies build 
separate clusters. The network of influence was investi- 




FIG. 26. The hierarchical structure of clusters of the S&P 
500 companies in the ferromagnetic case. In the boxes the 
number of elements of the cluster are indicated. The clusters 
consi sting of single compan ies are not indicated. Adapted 
from lKullmann et all (j2000l ). 



ga ted by means of a. time -dependent correlation method 
bv lKuUmann et all (|2000[) . They studied the correlations 
as the function of the time shift between pairs of stock 
return time series of tick-by-tick data of the NYSE. They 
investigated whether any "pulling effect" between stocks 
existed or not, i.e. whether at any given time the re- 
turn value of one stock influenced that of another stock 
at a different time or not. They found that, in general, 
two types of mechanisms generated signiflcant correlation 
between any two given stocks. One was some kind of ex- 
ternal effect (say, economic or political news) that influ- 
enced both stock prices simultaneously, and the change 
for both prices appeared at the same time, such that 
the maximum of the correlation was at zero time shift. 
The second effect was that, one of the companies had an 
influence on the other company indicating that one com- 
pany's operation depended on the other, so that the price 
change of the influenced stock appeared latter because it 
required some time to react on the price change of the 
first stock displaying a "pulling effect" . A weak but sig- 
nificant effect with the real data set was found, showing 
that in many cases the maximum correlation was at non- 
zero time shift indicating directions of influence between 
the companies, and the characteristic time was of the 
order of a few minutes, which was compatible with effi- 
cient market hypothesis. In the pulling effect, they found 
that in general, more important companies (which were 
traded more) pulled the relatively smaller companies. 

The time dependent properties of the minimum span- 
ning tree (introduced by M antegna), call e d a 'dy namic 



nmg tree (^mtroducea by M antegna), call e d, a ay namic 
asset tree', were studied bv lOnnela et all (|2003bD . The 
nodes of the tree were identified with stocks and the dis- 
tance between them was a unique function of the corre- 
sponding element of the correlation matrix. By using the 
concept of a central vertex, chosen as the most strongly 
connected node of the tree, the mean occupation layer 
was defined, which was an important characteristic of 
the tree. During crashes the strong global correlation in 
the market manifested itself by a low value of the mean 
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occupation layer. The tree seemed to have a scale free 
structure where the scaling exponent of the degree dis- 
tribution was different for 'business as usual' and 'crash' 
periods. The basic structure of the tree topology was 
very robust with respect to time. Let us discuss in more 
details how the dynamic asset tree was applied to studies 
of economic taxonomy. 

a. Financial Correlation matrix and constructing As- 
set Trees Two different sets of financial data were used. 
The first set from the Standard & Poor's 500 index 
(S&P500) of the New York Stock Exchange (NYSE) 
from July 2, 1962 to December 31, 1997 contained 8939 
daily closing values. The second set recorded the split- 
adjusted daily closure prices for a total of iV = 477 stocks 
traded at the New York Stock Exchange (NYSE) over 
the period of 20 years, from 02-Jan-1980 to 31-Dcc-1999. 
This amounted a total of 5056 prices per stock, indexed 
by time variable r — 1, 2, . . . , 5056. For analysis and 
smoothing purposes, the data was divided time-wise into 
M windows t = 1, 2, M of width T, where T corre- 
sponded to the number of daily returns included in the 
window. Note that several consecutive windows over- 
lap with each other, the extent of which is dictated by 
the window step length parameter ST, which describes 
the displacement of the window and is also measured in 
trading days. The choice of window width is a trade-off 
between too noisy and too smoothed data for small and 
large window widths, respectively. The results presented 
here were calculated from monthly stepped four-year win- 
dows, i.e. ST = 250/12 « 21 days and T = 1000 days. 
A large scale of different values for both paramet ers were 
explor ed, and the cited values were found optimal (lOnnelal 
(I2OOOD ). With these choices, the overall number of win- 
dows is M = 195. 

The earlier definition of correlation matrix, given by 
Eq. [12] is used. These correlation coefficients form an 
NxN correlation matrix C* , which serves as the basis for 
trees discussed below. An asset t ree is then constr ucted 
according to the methodology by iMantegnal ( 1999[ ). For 
the purpose of constructing asset trees, a distance is de- 
fined between a pair of stocks. This distance is associated 
with the edge connecting the stocks and it is expected to 
reflect the level at which the stocks are correlated. A 
simple non-linear transformation d* 



used to obtain distances with the property 2 > dij > 0, 
forming an N x N symmetric distance matrix D*. So, 
if dij = 0, the stock price changes are completely cor- 
related; if dij = 2, the stock price changes are com- 
pletely anti-uncorrclated. The trees for different time 
windows are not independent of each other, but form 
a series through time. Consequently, this multitude of 
trees is interpreted as a sequence of evolutionary steps 
of a single dynamic asset tree. An additional hypothe- 
sis is required about the topology of the metric space: 
the ultrametricity hypothesis. In practice, it leads to 
determining the minimum spanning tree (MST) of the 
distances, denoted T*. The spanning tree is a simply 
connected acyclic (no cycles) graph that connects all N 



nodes (stocks) with TV — 1 edges such that the sum of 
all edge weights, J2d* gt' '^Ij^ minimum. We refer to 
the minimum spanning tree at time t by the notation 
T* = {V, E*), where F is a set of vertices and is a cor- 
responding set of unordered pairs of vertices, or edges. 
Since the spanning tree criterion requires all N nodes to 
be always present, the set of vertices V is time indepen- 
dent, which is why the time superscript has been dropped 
from notation. The set of edges i?*, however, does de- 
pend on time, as it is expected that edge lengths in the 
matrix D* evolve over time, and thus different edges get 
selected in the tree at different times. 

b. Market characterization We plot the distribution 
of (i) distance elements d*^ contained in the distance ma- 
trix D* (Fig. [27]), (ii) distance elements dij contained 
in the asset (minimum spanning) tree T* (Fig. 1^5]) . In 
both plots, but most prominently in Fig. 1271 there ap- 
pears to be a discontinuity in the distribution between 
roughly 1986 and 1990. The part that has been cut out, 
pushed to the left and made flatter, is a manifestation of 
Black Monday (October 19, 1987), and its length along 
th e time axis is related to the choice of window width 
T lOnnela eFoZI (l2003al lbl). Also, note that in the dis- 
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FIG. 27. Distribution of all iV(7V- l)/2 distance elements dij 
contained in the distance matrix D* as a function of time. 

tribution of tree edges in Fig. [55] most edges included in 
the tree seem to come from the area to the right of the 
value 1.1 in Fig. [27] and the largest distance element is 
dmax = 1.3549. 

Tree occupation and central vertex Let us focus 
on characterizing the spread of nodes on the tree, by 
introducing the quantity of mean occupation layer 



1 ^ 



(16) 
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where lev{vi) denotes the level of vertex Vi. The levels, 
not to be confused with the distances dij between nodes, 
arc measured in natural numbers in relation to the central 
vertex Vc, whose level is taken to be zero. Here the mean 
occupation layer indicates the layer on which the mass 
of the tree, on average, is conceived to be located. The 
central vertex is considered to be the parent of all other 
nodes in the tree, and is also known as the root of the 
tree. It is used as the reference point in the tree, against 
which the locations of all other nodes are relative. Thus 
all other nodes in the tree are children of the central 
vertex. Although there is an arbitrariness in the choice 
of the central vertex, it is proposed that the vertex is 
central, in the sense that any change in its price strongly 
affects the course of events in the market on the whole. 
Three alternative definitions for the central vertex were 
proposed in the studies, all yielding similar and, in most 
cases, identical outcomes. The idea is to find the node 
that is most strongly connected to its nearest neighbors. 
For example, according to one definition, the central node 
is the one with the highest vertex degree, i.e. the number 
of edges which arc incident with (neighbor of) the vertex. 
Also, one may have either (i) static (fixed at all times) or 
(ii) dynamic (updated at each time step) central vertex, 
but again the results do not seem to vary significantly. 
The study of the variation of the topological properties 
and nature of the trees, with time were done. 



Economic taxonomy Mantegna's idea of linking 
stocks in an ultrametric space was motivated a posteri- 
ori by the property of s uch a space to provid e a meaning- 
ful economic taxonomy ( Onnela et all ( 20021 ) ) . Mantegna 
examined the meaningfulness of the taxonomy, by com- 
paring the grouping of stocks in the tree with a third 




1966 

time (year) iriHi 



• Basic Materials 

■ Capitai Goods 

♦ Conglomerates 

* Consumer/Cyciicai 
▼ Consumer/Non-Cyciioai 

• Energy 

• Financiai 

■ Heaitiicare 

♦ Services 

* Techinology 
T Transportation 
, Utilities 



Utilities 



Health-care 




FIG. 29. Snapshot of a dynamic asset tree connecting the 
examined 116 stocks of the S&P 500 index. The tree was 
produced using four-year window width and it is centered on 
January 1, 1998. Business sectors are indicated according 
to Forbes (www.forbes.com). In this tree. General Electric 
(GE) was used as the central vertex and eight layers can be 
identified. 



party reference grouping o f stoc ks e.g. by their industry 
classifications ( Mantegna ( 1999| )). In this case, the ref- 
erence was provided by Forbes (www.forbes.com), which 
uses its own classification system, assigning each stock 
with a sector (higher level) and industry (lower level) 
category. In order to visualize the grouping of stocks, 
a sample asset tree is constructed for a smaller dataset 
(shown in Fig. [51]), which consists of 116 S&P 500 stocks, 
extending from the beginning of 1982 to the end of 
2000, resulting in a to tal of 4787 price quotes per stock 
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FIG. 28. Distribution of the (A'^ — 1) distance elements dij con- 
tained in the asset (minimum spanning) tree T' as a function 
of time. 



(l2003br iV The window width was set at 
T = 1000, and the shown sample tree is located time- 
wise a.t t = t* , corresponding to 1.1.1998. The stocks in 
this dataset fall into 12 sectors, which are Basic Materi- 
als, Capital Goods, Conglomerates, Consumer/Cyclical, 
Consumer/Non-Cyclical, Energy, Financial, Healthcare, 
Services, Technology, Transportation and Utilities. The 
sectors are indicated in the tree (sec Fig. [29|) with differ- 
ent markers, while the industry classifications are omit- 
ted for reasons of clarity. The term sector is used ex- 
clusively to refer to the given third party classification 
system of stocks. The term branch refers to a subset of 
the tree, to all the nodes that share the specified com- 
mon parent. In addition to the parent, it is needed to 
have a reference point to indicate the generational direc- 
tion (i.e. who is who's parent) in order for a branch to 
be well defined. Without this reference there is abso- 
lutely no way to determine where one branch ends and 
the other begins. In this case, the reference is the central 
node. There are some branches in the tree, in which most 
of the stocks belong to just one sector, indicating that 
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the branch is fairly homogeneous with respect to busi- 
ness sectors. Th is finding is in accordance with those of 
iMantegnal ( 1999| ) , although there are branches that are 
fairly heterogeneous, such as the one extending directly 
downwards from the central vertex (see Fig. 
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This first part of our review has shown statistical prop- 
erties of financial data (time series of prices, order book 
structure, assets correlations). Some of these properties, 
such as fat tails of returns or volatility clustering, are 
widely known and acknowledged as "financial stylized 
facts". They are now largely cited in order to compare 
financial models, and reveal the lacks of many classical 
stochastic models of financial assets. Some other prop- 
erties are newer findings that are obtained by studying 
high-frequency data of the whole order book structure. 
Volume of orders, interval time between orders, intra- 
day seasonality, etc. arc essential phenomcnons to be 
understood when working in financial modelling. The 
important role of studies of correlations has been em- 
phasized. Beside the technical challenges raised by high- 
frequency, many studies based for example on random 
matrix theory or clustering algorithms help getting a bet- 
ter grasp on some Economics problems. It is our belief 
that future modelling in finance will have to be partly 
based on Econophysics work on agent-based models in 
order to incorporate these "stylized facts" in a compre- 
hensive way. Agent-based reasoning for order book mod- 
els, wealth exchange models and game theoretic models 
will be reviewed in the following part of the review, to 
appear in a following companion paper. 



Part II 

VI. INTRODUCTION 

In the first part of the review, empirical developments 
in Econophysics have been studied. We have pointed 
out that some of these widely known "stylized facts" 
are already at the heart of financial models. But many 
facts, especially the newer statistical properties of or- 
der books, are not yet taken into account. As advo- 
cated by many dur i ng th e financial crisis in 20 7-200 8 
fsee e.g. iBouchaudI (l2008l) : iLux and Westerhofj (|2009t) : 
[Farmer and Foley ( 20091 ) ). agent-based models should 
have a great role to play in future financial modelling. 
In economic models, there is usually the representative 
agent, who is "perfectly rational" and uses the "utility 
maximization" principle while taking actions. Instead 
the multi-agent models that have originated from sta- 
tistical physics considerations have allowed to go beyond 
the prototype theories with the "representative" agent in 
traditional economics. In this second part of our review, 
we present recent developments of agent-based models in 



VII. AGENT-BASED MODELLING OF ORDER BOOKS 

A. Introduction 

Although known, at least partly, for a long time - 
iMandelbrol (jl963l ) gives a reference for a paper dealing 
with non- normality of price time series in 1915, followed 
by several others in the 1920's - "stylized facts" have 
often been left aside when modelling financial markets. 
They were even often referred to as "anomalous" charac- 
teristics, as if observations failed to comply with theory. 
Much has been done these past fifteen years in order to 
address this challenge and provide new models that can 
reproduce these facts. These recent developments have 
been built on top of early attempts at modelling mech- 
anisms of fin ancial markets with agents. For example, 
Stiglei ( 1964 ). investigating some rules of the SECo, c 



Garman ( 1976[ ). investigating double-auction microstruc- 



ture, belong to those historical works. It seems that the 
first modern attempts at that type of models were made 
in the field of behavioural finance. This field aims at 
improving financial modelling based on the psychology 
and sociology of the investors. Models are built with 
agents who can exchange shares of stocks according to 
exogcnously defined utility functions r cfiectin g their pref- 
erences and risk aversions. iLeBaroni (|2006bi ) shows that 
this type of modelling offers good fle xibility fo r repro - 
ducing some of the stylized facts and iLeBaronI ( 2006al ) 
provides a review of that type of model. However, al- 
though achieving some of their goals, these models suf- 
fer from many drawbacks: first, they are very complex, 
and it may be a very difficult task to identify the role of 
their numerous parameters and the types of dependence 
to these parameters; second, the chosen utility functions 
do not necessarily reflect what is observed on the mech- 
anisms of a financial market. 

A sensible change in modelling appears with much 
simpler models implementing onl y well-identified and 
presum ably realistic "behaviour": ICont and BouchaudI 
uses noise traders that are subject to "herding", 
i.e. form random clusters of traders sh aring the same 
view o n the market. The idea is used in iRaberto et all 
(I2OOII) as well. A complementary approach is to char- 
acterize traders as fundamentalists, chartists or noise 
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traders. iLux and Marchesil ^qM) propose an agent- 
based model in which these types of traders interact. In 
all these models, the price variation directly results from 
the excess demand: at each time step, all agents submit 
orders and the resulting price is computed. Therefore, 
everything is cleared at each time step and there is no 
structure of order book to keep track of orders. 

One big step is made with models really taking into 
account limit orders and keeping ther n in an order book 
once submitted and not executed. IChiarella and loril 
(I2OO2D build an agent-based model where all traders sub- 
m it orders dependin g on t he three elements identified 
in ILux and Marchesil (|2000[ ): chartists, fundamentalists, 
noise. Orders submitted are then stored in a persistent 
order book. In fact, one of t he first simple mo dels with 
this feature was proposed in iBak et all ( 1997t ) . In this 
model, orders are particles moving along a price line, and 
each collision is a transaction. Due to numerous caveats 
in this model, the authors propose in the same paper an 
extension with fundamentalist and noise traders in the 
spirit of the models previously evoked. iMaslovl ( 20001 ) 
goes further in the modelling of trading mechanisms by 
taking into account fixed limit orders and market orders 
that trigger transactions, and really simulating the or- 
der book. This model was analytic a lly so lved using a 
mean-field approximation by ISlanin i (l200lh . 

Following this trend of modelling, the more or less 
"rational" agents composing models in economics tends 
to vanish and be replaced by the notion of flows: or- 
ders are not submitted any more by an agent follow- 
ing a strategic behaviour, but are viewed as an arriv- 
ing flow whose properties are to be determined by em- 
pirical observations of market mechanisms. Thus, the 
modelling of order books calls for more "stylized facts" , 
i.e. empirical properties that could be observed on a 
large number of order-driven markets. iBiais et 
is a thorough empirical study of the order flows in the 
Paris Bourse a few years after its complete computer- 
ization. Market orders, li mit orders, time of ar rivals 
and placement are studied. Bouchaud et all ( 20021 ) and 



iPotters and BouchaudI ( 2003 ) provide statistical features 



on the order book itself. These empirical studies, that 
have been reviewed in the first part of this review, are 
the foundation for "zero-intelligence" models, in which 
"stylized facts" are expected to be reproduced by the 
properties of the order flows and the structure of order 
book itself, without considering exogenous "rationality" . 
IChallet and Stinchcombl (|2001l) propose a simple model 
of order flows: limit orders are deposited in the order 
book and can be removed if no t executed, in a simple 
deposition-evaporation process. iBouchaud et al. I (I2OO2D 
use this type of model with empirical distribution as in- 
puts. As of today, the most complet e empi rical model 
is to our knowledge iMike and Farmeil (|2008l) . where or- 
der placement and cancellation models are proposed 
and fitted on empirical data. Finally, new challenges 
arise as scientists try to identify simple mechanisms 
that allow an agent-based model to reproduce non-trivial 



behav iours: herding behaviour ir ICont and BouchaudI 
(|2000t ). dynamic price placement i n iPreis et al\ (|2007l ). 
threshold behaviour in IContI (I2007D . etc. 

In this part we review some of these models. This sur- 
vey is of course far from exhaustive, and we have just 
selected models that we feel are representative of a spe- 
cific trend of modelling. 



B. Early order-driven market modelling: Market 
microstructure and policy issues 

The pioneering works in simulation of financial mar- 
kets were aimed to stu dy market regulations. The very 
first one, IStideil (11964 . tries to investigate the effect of 
regulations of the SEC on American stock markets, using 
empirical data from the 20's and the 50's. Twenty years 
later, at the start of th e comp uterization of financial mar- 
kets, |Ha^nsso^^e7^aI| ( 19851 ) implements a simulator in 
order to test the feasibility of automated market making. 
Instead of reviewing the huge microstructure lit erature, 
we ref er the reader to the well-known books by lO'Haral 
(|1995| ) or lHasbrouci3 (|2007t) . for example, for a panorama 
of this branch of finance. However, by presenting a small 
selection of early models, we here underline the ground- 
ing of recent order book modelling. 



1. A pioneer order book model 

To our knowledg e, the first atte mpt to simulate a finan- 
cial market was bv lStideil (11964 . This paper was a bit- 
ing and controversial reaction to the Report of t he Spe- 
cial Stu dy of the Securities Markets of the SEC ( CohenI 
(ll963aD ).' whose aim was to "study the adequacy of rules 
of the exchange and that the New York stock exchange 
undert akes to regulat e its members in all of their activ- 
ities" (jCohenl (Il963bl )). According to Stigler, this SEC 
report lacks rigorous tests when investigating the effects 
of regulation on financial markets. Stating that "de- 
mand and supply are [...] erratic flows with sequences 
of bids and asks dependent upon the random circum- 
stances of individual traders" , he proposes a simple sim- 
ulation model to investigate the evolution of the market. 
In this model, constrained by simulation capability in 
1964, price is constrained within L = 10 ticks. (Limit) 
orders are randomly drawn, in trade time, as follows: 
they can be bid or ask orders with equal probability, and 
their price level is uniformly distributed on the price grid. 
Each time an order crosses the opposite best quote, it is 
a market order. All orders are of size one. Orders not 
executed A'^ = 25 time steps after their submission are 
cancelled. Thus, N is the maximum number of orders 
available in the order book. 

In the original paper, a run of a hundred trades was 
manually computed using tables of random numbers. 
Of course, no particular results concerning the "styl- 
ized facts" of financial time scries was expected at that 
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tim e. However, in his review of some order book mod- 
els, ^lanin al (|2008D makes simulations of a similar model, 
with parameters L = 5000 and N — 5000, and shows 
that price returns are not Gaussian: their distribution 
exhibits power law with exponent 0.3, far from empirical 
data. As expected, the limitation L is responsible for a 
sharp cut-off of the tails of this distribution. 



2. Microstructure of the double auction 



iGarmanI ( 1976[ ) provides an early study of the double 
auction market with a point of view that does not ignore 
temporal structure, and really defines order flows. Price 
is discrete and constrained to be within Buy 
and sell orders are assumed to be submitted according to 
two Poisson processes of intensities A and ji. Each time 
an order crosses the best opposite quote, it is a market 
order. All quantities are assumed to be equal to one. The 
aim of the author was to provide an empirical study of 
the market microstructure. The main result of its Poisson 
model was to support the idea that negative correlation 
of consecutive price changes is linked the microstructure 
of the double auction exchange. This paper is very in- 
teresting because it can be seen as precursor that clearly 
sets the challenges of order book modelling. First, the 
mathematical for mulation is prom ising. With its fixed 
constrained prices. ICarmanl ( 1976( ) can define the state of 
the order book at a given time as the vector (?T.i)i=i,...,L of 
awaiting orders (negative quantity for bid orders, positive 
for ask orders). Future analytical models will use similar 
vector formulations that can be cast it into known math- 
ematical p rocesses in order t o extract analytical results 
- see e.g. iGont et al\ (|2008D reviewed below. Second, 
the author points out that, although the Poisson model 
is simple, analytical solution is hard to work out, and he 
provides Monte Garlo simulation. The need for numerical 
and empirical developments is a constant in all following 
models. Third, the structural question is clearly asked in 
the conclusion of the paper: "Does the auction-market 
model imply the characteristic leptokurtosis seen in em- 
pirical security price changes?" . The computerization of 
markets that was about to take place when this research 
was published - Toronto's GATqf] opened a year later in 
1977 - motivated many followi ng papers on the subject . 
As an example, let us cite here iHakansson et al\ ( 1985I) . 
who built a model to choose the right mechanism for set- 
ting clearing prices in a multi-securities market. 



grid of possible prices. Traders do not observe the mar- 
ket here and do not act according to a given strat- 
egy. Thus, these two contributions clearly belong to 
a cla ss of "zero-intellig e nce" models. To our knowl- 
edge, iGode and Sunde rl (I1993D is the first paper to in- 
troduce the expression "zero-intelligence" in order to de- 
scribe non-strategic behaviour of traders. It is applied 
to traders that submit random orders in a double auc- 
tion market. The expression has since been widely used 
in agent-based modelling, sometimes in a slightly differ- 
ent meanin g (see more recent mode ls described in this 
review). In iGode and Sunderl ( 19931 ). two types of zero- 
intelligence traders are studied. The first are uncon- 
strained zero-intelligence traders. These agents can sub- 
mit random order at random prices, within the allowed 
price range {1, . . . , i}. The second arc constrained zero- 
intelligence traders. These agents submit random or- 
ders as well, but with the constraint that they cannot 
cross their given reference price pf^: constrained zero- 
intelligence traders are not allowed to buy or sell at loss. 
The aim of the authors was to show that double auction 
markets exhibit an intrinsic "allocative efficiency" (ratio 
between the total profit earned by the traders divided by 
the maximum possible profit) even with zero-intelligence 
traders. An interesting fact is that in this experiment, 
price series resulting from actions by zero-intelligence 
traders are much more volatile than the ones obtained 
with constrained traders. This fact will be confirmed in 
future models where "fundamentalists" traders, having 
a re ference price, are expected t o st abilize the market 
(see Wvart and BouchaudI ( 20071 ) or ILux and Marchesil 
( 2OO0I) below) . Note that t he results have been criticized 
bv lCliff and BrutenI (I1997D . who show that the observed 
convergence of the simulated price towards the theoret- 
ical equilibrium price may be an artefact of the model. 
More precisely, the choice of traders' demand carry a lot 
of constraints that alone explain the observed results. 

Modern works in Econophysics owe a lot to these early 
models or contributions. Starting in the mid-90's, physi- 
cists have proposed simple order book models directly 
inspired from Physics, where the analogy "order = par- 
ticle" is emphasized. Three main contributions are pre- 
sented in the next section. 



C. Order-driven market modelling in Econophysics 



3. Zero-intelligence 



In the models by IStigleil (Il964l ) and iGarmanI (|l976[) . 

orders are submitted in a purely random way on the 
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1. The order book as a reaction-diffusion model 

A very si mple model directly taken from Physics was 
presented in iBak et all (|l997t ) . The authors consider a 
market with N noise traders able to exchange one share 
of stock at a time. Price p{t) at time t is constrained to 
be an integer (i.e. price is quoted in number of ticks) with 
an upper bound p: Wt, pit) S {0, . . . Simulation is 
initiated at time with half of the agents asking for one 
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share of stock (buy orders, bid) with price: 

pliO) & {0,p/2}, .7 = l,...,7V/2, (17) 

and the other half offering one share of stock (seU orders, 
ask) with price: 

pi(0) e {p/2,p}, j = l,...,N/2. (18) 

At each time step t, agents revise their offer by exactly 
one tick, with equal probability to go up or down. There- 
fore, at time t, each seller (resp. buyer) agent chooses his 
new price as: 

pi(t + 1) = piit) ± 1 (resp. plit + 1) = plit) ± 1 ). 

(19) 

A transaction occurs when there exists G 
{1, . . . ,iV/2}2 such that pl{t + 1) = pi{t + 1). In such a 
case the orders are removed and the transaction price is 
recorded as the new price p{t). Once a transaction has 
been recorded, two orders are placed at the extreme po- 
sitions on the grid: pl{t -|- 1) = and pl{t + 1) = p. As a 
consequence, the number of orders in the order book re- 
mains constant and equal to the number of agents. In fig- 
ure 1301 an illustration of these moving particles is given. 

As pointed out by the authors, this process of simula- 
tion is similar the reaction-diffusion model A + B ^ (d 
in Physics. In such a model, two types of particles are 
inserted at each side of a pipe of length p and move ran- 
domly with steps of size 1. Each time two particles col- 
lide, they're annihilated and two new particles are in- 
serted. The analogy is summarized in table HI Following 
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FIG. 31. Snapshot of the limit order book in the Bak, 
Paczu ski and Shubik model. Reproduced from iBak et al\ 
(11993). 



TABLE I. Analogy b etween the A + B 
and the order book in iBak et al. 1 ^9^. 



reaction model 



Physics 


Bak et al. ( 1997) 


Particles 
Finite Pipe 
Collision 


Orders 
Order book 
Transaction 



this analogy, it thus can be showed that the variation 
Ap{t) of the price p{t) verifies : 



Ap(i)^i^/^(ln(f))i/2. 

to 



(20) 




FIG. 30. Illustration of the Bak, Paczuski and Shubik model: 
white particles (buy orders, bid) moving from the left, black 
parti cles (sell orders, a sk) moving from the right. Reproduced 
from lBak et~ai\ (|1997D . 



Thus, at long time scales, the series of price incre- 
ments simulated in this model exhibit a Hurst exponent 
H = 1/4:. As for the styhzed fact H w 0.7, this sub- 
diffusive behavior appears to be a step in the wrong direc- 
tion compared to the random walk H = 1/2. Moreover, 
ISlaninal (200l) points out that no fat tails are observed in 
the distribution of the returns of the model, but rather 
fits the empirical distribution with an exponential de- 
cay. Other drawbacks of the model could be mentioned. 
For example, the reintroduction of orders at each end 
of the pipe leads to unrealistic shape of the order book, 
as shown on figure 1311 Actually here is the main draw- 
back of the model: "moving" orders is highly unrealistic 
as for modelling an order book, and since it does not 
reproduce any known financial exchange mechanism, it 
cannot be the base for any larger model. Therefore, at- 
tempts by the authors to build several extensions of this 
simple framework, in order to reproduce "stylized facts" 
by adding fundamental traders, strategies, trends, etc. 
are not of interest for us in this review. However, we feel 
that the basic model as such is very interesting because 
of its simplicity and its "particle" representation of an 
order-driven market that has opened the way for more 
realistic models. 
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FIG. 32. Empirical probability density functions of the price 
increments in the Maslov model. In in set, log- l og plo t of the 
positive increments. Reproduced from iMaslovl (|2000l ). 



2. Introducing market orders 



IMaslovl (I2OOOI) kee ps the zero-intelligence structure of 
the Bak et ~cd\ (I1997I) model but adds more realistic fea- 
tures in the order placement and evolution of the mar- 
ket. First, limit orders are submitted and stored in the 
model, without moving. Second, limit orders are sub- 
mitted around the best quotes. Third, market orders are 
submitted to trigger transactions. More precisely, at each 
time step, a trader is chosen to perform an action. This 
trader can either submit a limit order with probability qi 
or submit a market order with probability 1 — qi. Once 
this choice is made, the order is a buy or sell order with 
equal probability. All orders have a one unit volume. 

As usual, we denote p(t) the current price. In case the 
submitted order at time step i -I- 1 is a limit ask (resp. 
bid) order, it is placed in the book at price p{t) + A 
(resp. p{t) — A), A being a random variable uniformly 
distributed in ]0; A*^ = 4]. In case the submitted order 
at time step i + 1 is a market order, one order at the 
opposite best quote is removed and the price p(t -I- 1) is 
recorded. In order to prevent the number of orders in 
the order book from large increase, two mechanisms are 
proposed by the author: either keeping a fixed maximum 
number of orders (by discarding new limit orders when 
this maximum is reached), or removing them after a fixed 
lifetime if they have not been executed. 

Numerical simulations show that this model exhibits 
non-Gaussian heavy-tailed distributions of returns. On 
figure 1321 the empirical probability density of the price 
increments for several time scales are plotted. For a time 
scale St ^ 1, the author fit the tails distribution with a 
power law with exponent 3.0, i.e. reasonable compared 
to empirical value. However, the Hurst exponent of the 
price series is still H = 1/4 with this model. It should 
also be noted that ISlaninal ( 200ll ) proposed an analytical 



study of the model using a mean- field approximation (See 
below section IVHEj) . 

This model brings very interesting innovations in or- 
der book simulation: order book with (fixed) limit or- 
ders, market orders, necessity to cancel orders waiting 
too long in the order book. These features are of prime 
importance in any following order book model. 



3. The order book as a deposition-evaporation process 



Chal lct and Stinchcombe (l200l[) co ntinue the work of 
iBak et al. (1997[) and IMaslovl (|2C)00l) . and develop the 



analogy between dynamics of an order book and an in- 
finite one dimensional grid, where particles of two types 
(ask and bid) are subject to three types of events: de- 
position (limit orders), annihilation (market orders) and 
evaporation (cancellation). Note that annihilation oc- 
curs when a particle is deposited on a site occupied by 
a particle of another type. The analogy is summarized 
in table [Hi Hence, the model goes as follows: At each 



TABLE II. Analogy betwe en the deposition-evaporat i on pr o- 
cess and the order book in lChallet and Stinchcombd (|200ll ). 



Physics 



Particles 
Infinite lattice 
Deposition 
Evaporation 
Annihilation 



Challct and Stinchcombe (2001' 



Orders 

Order book 
Limit orders submission 
Limit orders cancellation 

Transaction 



time step, a bid (resp. ask) order is deposited with prob- 
ability A at a price n{t) drawn according to a Gaussian 
distribution centred on the best ask a{t) (resp. best bid 
h{t)) and with variance depending linearly on the spread 
s\t) =a{t)-b{t): a{t) ^Ks{t) + C. li n{t) > a{t) (resp. 
n{t) < b{t)), then it is a market order: annihilation takes 
place and the price is recorded. Otherwise, it is a limit 
order and it is stored in the book. Finally, each limit or- 
der stored in the book has a probability S to be cancelled 
(evaporation). 

Figure 1551 shows the average return as a function of the 
time scale. It appears that the series of price returns sim- 
ulated with this model exhibit a Hurst exponent H — 1/4 
for short time scales, and that tends to iJ = 1/2 for larger 
time scales. This behaviour might be the consequence of 
th e random evap oration process (which was not modelled 
in IMaslovl (2000), where H ~ 1/4 for large time scales). 
Although some modifications of the process (more than 
one order per time step) seem to shorten the sub-diffusive 
region, it is clear that no over-diffusive behaviour is ob- 
served. 
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FIG. 33. Average return (rAt) as a function of At for differ- 
ent sets of parameters and simultaneous depositions allowed 
in the Challet and Stinchcomb e model. Reproduced from 
IChallet and Stinchcomb3 (|200ll '). 



D. Empirical zero-intelligence models 

The three models presented in the previous section 
IVII CI have successively isolated essential mechanisms 
that are to be used when simulating a "realistic" mar- 
ket: one order is the smallest entity of the model; the 
submission of one order is the time dimension (i.e. event 
time is used, not an exogenous time defined by mar- 
ket clearing and "tatonnement" on exogenous supply 
and deman d funct i ons) ; submission of market orders 
(as such inlMaslovl (I200C ) . as " crossing limit orders" in 
IChallet and Stinchcombd (!200l ) and cancellation of or- 
ders are taken into account. On the one hand, one may 
try to describe these mechanisms using a small number 
of parameters, using Poisson process with constant rates 
for order flows, constant volumes, etc. This might lead to 
some analytically tractable models, as will be described 
in section IVII El On the other hand, one may try to 
fit more complex empirical distributions to market data 
without analytical concern. 

This type of mode lling is best represented by 
iMike and Farmeil (|2008[ ). It is the first model that pro- 
poses an advanced calibration on the market data as 
for order placement and cancellation methods. As for 
volume and time of arrivals, assumptions of previous 
models still hold: all orders have the same volume, dis- 
crete event time is used for simulation, i.e. one order 
(limit or market) is subm i tted p er time step. Following 
IChallet and Stinchcombl (|200l[ ). there is no distinction 
between market and limit orders, i.e. market orders are 
limit orders that are submitted across the spread s{t). 
More precisely, at each time step, one trading order is 
simulated: an ask (resp. bid) trading order is randomly 
placed at n{t) = a{t) + 6a (resp. n{t) = b{t) + 5b) 
according to a Student distribution with scale and de- 
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a Simulation, slope - -1.9 
5 Real dala. slope - -2,1 
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FIG. 34. Lifetime of orders for simulated data in the Mike 
and Farmer model, comp ared to the empir i cal da ta used for 
fitting. Reproduced from lMike and Farmed (|2008l ). 



grees of freedom calibrated on market data. If an ask 
(resp. bid) order satisfies Sa < —s{t) = b{t) — a{t) (resp. 
db > s{t) = a{t) — b{t)), then it is a buy (resp. sell) mar- 
ket order and a transaction occurs at price a(t) (resp. 
b{t). 

During a time step, several cancellations of orders may 
occur. The authors propose an empirical distribution for 
cancellation based on three components for a given order: 

• the position in the order book, measured as the 
ratio y{t) = where A{t) is the distance of the 
order from the opposite best quote at time t, 

• the order book imbalance, measured by the in- 
dicator Nimb{t) = jv„(t)+J^b(t) ("^^^^P- ^'>^"ib{t) = 

N (t)+7Vb(t) ) ^'-'^ (resp. bid) orders, where Na{t) 
and Ni,{t) are the number of orders at ask and bid 
in the book at time t. 



• the total number N(t) 
the book. 



Na{t) + Nbit) of orders in 



Their empirical study leads them to assume that the 
cancellation probability has an exponential dependance 
on y{t), a linear one in Nimb and finally decreases ap- 
proximately as 1/Nt{t) as for the total number of orders. 
Thus, the probability P{C\y{t), N^^b{t), Nt{t)) to cancel 
an ask order at time t is formally written : 

P{C\y{t),N,^b{t),Nt{t)) = A{l-e-y^'^){N,^b{t)+B)- 



(21) 

where the constants A and B are to be fitted on mar- 
ket data. Figure |34l shows that this empirical formula 
provides a quite good fit on market data. 

Finally, the authors mimic the observed long memory 
of order signs by simulating a fractional Brownian mo- 
tion. The auto-covariance function T(t) of the increments 
of such a process exhibits a slow decay : 



r(fc) - H{2H- l)t 



2H-2 



(22) 
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FIG. 35. Cumulative distribution of returns in the Mike and 
Farmer model, compar ed to the empirical data used for fit- 
ting. Reproduced from lMike and Farmed (|2008l V 



and see how modening based on heterogeneous agents 
might help to reproduce non-trivial behaviours. Prior to 
this development below in IVII Fl we briefly review some 
analytical works on the "zero-intelligence" models. 



E. Analytical treatments of zero-intelligence models 

In this section we present some analytical results ob- 
tained on zero-intelligence models where processes are 
kept sufSciently sim ple so that a m ean-field approxima- 
tion may be derived ( Slaninal ( 200lh ) or probabilities con- 
ditionaly to the state of the order book may be computed 
(|Cont et all Q008,)). The key assumptions here are such 
that the process describing the order book is stationary. 
This allows either to write a stable density equation, or 
to fit the model into a nice mathematical framework such 
as crgodic Markov chains. 



and it is therefore easy to reproduce exponent /3 of the 
decay of the empirical autocorrelation function of order 
signs observed on the market with H = 1 — j3 /2. 

The results of this empirical model are quite satisfying 
as for return and spread distribution. The distribution 
of returns exhibit fat tails which are in agreement with 
empirical data, as shown on figure [551 The spread distri- 
bution is also very well reproduced. As their empirical 
model has been built on the data of only one stock, the 
authors test their model on 24 other data sets of stocks 
on the same market and find for half of them a good 
agreement between empirical and simulated properties. 
However, the bad results of the other half suggest that 
such a model is still far from being "universal" . 

Despite these very nice results, some drawbacks have 
to be pointed out. The first one is the fact that the sta- 
bility of the simulated order book is far from ensured. 
Simulations using empirical parameters in the simula- 
tions may bring situations where the order book is emp- 
tied by large consecutive market orders. Thus, the au- 
thors require that there is at least two orders in each 
side of the book. This exogenous trick might be impor- 
tant, since it is activated precisely in the case of rare 
events that influence the tails of the distributions. Also, 
the original model d oes not focus on volatility clustering. 
iGu and Zhou! (I2009D propose a variant that tackles this 
feature. Another important drawback of the model is the 
way order signs are simulated. As noted by the authors, 
using an exogenous fractional Brownian motion leads to 
correlated price returns, which is in contradiction with 
empirical stylized facts. We also find that at long time 
scales it leads to a dramatic increase of volatility. As we 
have seen in the first part of the review, the correlation 
of trade signs can be at least partly seen as an artefact 
of execution strategies. Therefore this element is one of 
the numerous that should be taken into account when 
"programming" the agents of the model. In order to do 
so, we have to leave the (quasi) "zero-intelligence" world 



1. Mean-field theory 



ISlaninal ( 200lh pro poses an analy tical treatment of the 
model introduced by iMaslovl (2000) and reviewed above. 
Let us briefly described the formalism used. The main 
hypothesis is the following: on each side of the current 
price level, the density of limit orders is uniform and con- 
stant (and p+ on the ask side, p- on the bid side). In 
that sense, this is a "mean-field" approximation since the 
individual position of a limit order is not taken into ac- 
count. Assuming we arc in a stable state, the arrival of 
a market order of size s on the ask (rcsp. bid) side will 
make the price change by a;+ = s/p+ (rcsp. a;_ = s/p-). 
It is then observed that the transformations of the vector 
X = {x^,X-) occurring at each event (new limit order, 
new buy market order, new sell market order) are linear 
transformation that can easily and explicitly be written. 
Therefore, an equation satisfied by the probability dis- 
tribution P of the vector X of price changes can be ob- 
tained. Finally, assuming further simplifications (such as 
/9+ = P-), one can solve this equation for a tail exponent 



and find that the distribution behaves as P{x) 



for 



large x. This analytical result is slightly differe nt from 
the one obtained by simulation in iMaslovl (|2000l) . How- 
ever, the numerous approximations make the comparison 
difficult. The main point here is that some sort of mean- 
field approximation is natural if we assume the existence 
of a stationary state of the order book, and thus may 
h elp handling o rder b ook models. 

ISmith et al\ (l2003l) also propose some sort of mean- 
field approximation for zero-intelligence models. In a 
similar model (but including a cancellation process), 
mean field theory and dimensional analysis produces in- 
teresting results. For example, it is easy to see that the 
book depth (i.e. number of orders) Nf,{p) at a price p far 
away from the best quotes is given by N^ip) = A/5, where 
A is the rate of arrival of limit orders per unit of time and 
per unit of price, and 6 the probability for an order to 
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be cancelled per unit of time. Indeed, far from the best 
quotes no market orders occurs, so that if a steady-state 
exists, the number of limit orders par time step A must 
be balanced by the number of cancellation 6Ne{p) per 
unit of time, hence the result. 



2. Explicit computation of probabilities conditionally on 
the state of the order book 



ICont et~al\ ( 2008f ) is an original attempt at analyti- 
cal treatments of limit order books. In their model, the 
price is contrained to be on a grid {!,..., N}. The state 
of the order book can then be described by a vector 
X{t) = {Xi{t),...,XN{t)) where \X,{t)\ is the quan- 
tity offered in the order book at price i. Conventionaly, 
Xi{t),i — 1, . . . ,N is positive on the ask side and neg- 
ative on the bid side. As usual, limit orders arrive at 
level i at a constant rate Ai, and market orders arrive 
at a constant rate fj.. Finally, at level i, eac h order can 
be cancelled at a rate 9i. Using this setting. [Cont et al\ 
(|2008il show that each event (limit order, market order, 
cancellation) transforms the vector X in a simple linear 
way. Therefore, it is shown that under reasonable con- 
ditions, X is an ergodic Markov chain, and thus admits 
a stationary state. The original idea is then to use this 
formalism to compute conditional probabilities on the 
processes. More precisely, it is shown that using Laplace 
transform, one may explicitly compute the probability of 
an increase of the mid price conditionally on the current 
state of the order book. 

This original contribution could allow explicit evalu- 
ation of strategies and open new perspectives in high- 
frequency trading. However, it is based on a simple 
model that does not reproduce empirical observations 
such as volatility clustering. Complex models trying to 
include market interactions will not fit into these analyt- 
ical frameworks. We review some of these models in the 
next section. 



F. Towards non-trivial behaviours: modelling market 
interactions 

In all the models we have reviewed until now, flows 
of orders are treated as independent processes. Under 
some (strong) modelling constraints, we can see the or- 
der b ook as a Markov chain and look for analytical re- 
sults (|Cont et al\ (|2008t )). In any case, even if the process 
i s emp irically detailed and not trivial ( Mike and Farmeil 
( 20081 )). we work with the assumption that orders are in- 
dependent and identically distributed. This very strong 
(and false) hypothesis is similar to the "representative 
agent" hypothesis in Economics: orders being succes- 



sively and independently submitted, we may not expect 
anything but regul ar behaviours. Following th e work of 
economists such as iKirmanI (pol I1993L I2002D . one has 
to translate the heterogeneous property of the markets 



into the agent-based models. Agents are not identical, 
and not independent. 

In this section wc present some toy models imple- 
menting mechanisms that aim at br inging heterogeneity: 
herdin g behaviour on markets in ICont and Bouchaud 
( 2000 ) , tre nd following behaviour in ILux and Marchesi 
( 2000 ) or in lPreis et a/T (|2007[ ). threshold behaviour lCont 
(|2007l ) . Most of the models reviewed in this section are 
not order book models, since a persistent order book is 
not kept during the simulations. They are rather price 
models, where the price changes are determined by the 
aggregation of excess supply and demand. However, they 
identify essential mechanisms that may clearly explain 
some empirical data. Incorporating these mechanisms in 
an order book model is not yet achieved but is certainly 
a future prospective. 



1. Herding behaviour 



The model presented in ICont and BouchaudI ( 2000l ) 
considers a market with N agents trading a given stock 
with price p{t). At each time step, agents choose to 
buy or sell one unit of stock, i.e. their demand is 
4>i{t) = ±l,i = 1, . . . ,iV with probability a or are idle 
with probability 1— 2a. The price change is assumed to be 
linearly linked with the excess demand D{t) = X]il=i 
with a factor A measuring the liquidity of the market : 



1 ^ 

p(t+l)==p(t) + -^0,(O 



(23) 



1=1 



A can also be interpreted as a market depth, i.e. the ex- 
cess demand needed to move the price by one unit. In 
order to evaluate the distribution of stock returns from 
Eg. ([23)) . we need to know the joint distribution of the 
individual demands (0i(i))i<i<Ar. As pointed out by the 
authors, if the distribution of the demand (f)i is indepen- 
dent and identically distributed with finite variance, then 
the Central Limit Theorem stands and the distribution 
of the price variation Ap{t) = p{t + l) —p{t) will converge 
to a Gaussian distribution as N goes to infinity. 

The idea here is to model the diffusion of the informa- 
tion among traders by randomly linking their demand 
through clusters. At each time step, agents i and j can 
be linked with probability pij = p = c being a param- 
eter measuring the degree of clustering among agents. 
Therefore, an agent is linked to an average number of 
(N — l)p other traders. Once clusters are determined, 
the demand are forced to be identical among all members 
of a given cluster. Denoting ndt) the number of cluster 
at a given time step t, Wk the size of the k-th cluster, 
k = 1, . . . ,nc(i) and 0fc = ±1 its investement decision, 
the price variation is then straightforwardly written : 



ne(t) 



Apit) 



(24) 



fc=i 
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This modelling is a direct application to the field of 
finance of the random graph framewor k as studied in 
lErdos and Rel^ (|l96nl >. |k irmani (jl983l ) previously sug- 
gested it in economics. Using these previous theoretical 
works, and assuming that the size of a cluster Wk and the 
decision taken by its members (f>k{t) are independent, the 
author are able to show that the distribution of the price 
variation at time t is the sum of nc{t) independent identi- 
cally distributed random variables with heavy-tailed dis- 
tributions : 



nc(t) 



Apit) 



(25) 



fc=i 



where the density f{x) of = Wk(t>k is decaying as 

A 



-e ^0 



(26) 



Thus, this simple toy model exhibits fat tails in the dis- 
tribution of prices variations, with a decay reasonably 
close to empirical data. Therefore, ICont and BouchaudI 
show that taking into account a naive mechanism 
of communication between agents (herding behaviour) is 
able to drive the model out of the Gaussian convergence 
and produce non-trivial shapes of distributions of price 
returns. 



Fundamentalists and trend followers 



ILux and MarchesH ( 2000t ) proposed a model very much 
in line with agent-based models in behavioural finance, 
but where trading rules are kept simple enough so that 
they can be identified with a presumably realistic be- 
haviour of agents. This model considers a market with N 
agents that can be part of two distinct groups of traders: 
Uf traders are "fundamentalists", who share an exoge- 
nous idea Pf of the value of the current price p; and ric 
traders are "chartists" (or trend followers), who make as- 
sumptions on the price evolution based on the observed 
trend (mobile average). The total number of agents is 
constant, so that nj +nc = N at any time. At each time 
step, the price can be moved up or down with a fixed 
jump size of ±0.01 (a tick). The probability to go up or 
down is directly linked to the excess demand ED through 
a coefficient f3. The demand of each group of agents is 
determined as follows : 

• Each fundamentalist trades a volume Vf propor- 
tional, with a coefficient 7, to the deviation of 
the current price p from the perceived fundamental 
value pf. Vf = j{pf — p). 

• Each chartist trades a constant volume Vc- Denot- 
ing n+ the number of optimistic (buyer) chartists 
and n_ the number of pessimistic (seller) chartists, 
the excess demand by the whole group of chartists 
is written (n^ — n^)Vc. 



Therefore, assuming that there exists some noise traders 
on the market with random demand fi, the global excess 
demand is written : 

ED^{n+~n_)Vc + nf-f{pf-p)+fi. (27) 

The probability that the price goes up (resp. down) is 
then defined to be the positive (resp. negative) part of 
I3ED. 

As observed in IWvart and BouchaudI (|2007() . funda- 
mentalists are expected to stabilize the market, while 
chartists should des t abilize it. In addition, following 
ICont and BouchaudI ( 2000( ). the authors expect non- 
trivial features of the price series to results from herding 
behaviour and transitions between groups of traders. Re- 
ferring to Kirman's work as well, a mimicking behaviour 
among chartists is thus proposed. The ric chartists can 
change their view on the market (optimistic, pessimistic), 
their decision being based on a clustering process mod- 
elled by an opinion index x = representing the 
weight of the majority. The probabilities 7r_|_ and tt_ to 
switch from one group to another are formally written : 



7r± 



c +U 



U = a\x + a2p/v, 



(28) 



where w is a constant, and ai and 0:2 reflect respectively 
the weight of the majority's opinion and the weight of 
the observed price in the chartists' decision. Transi- 
tions between fundamentalists and chartists are also al- 
lowed, decided by comp arison of expected returns (see 
iLux and Marchesil (|2000t ) for details). 

The authors show that the distribution of returns gen- 
erated by their model have excess kurtosis. Using a 
Hill estimator, they fit a power law to the fat tails of 
the distribution and observe exponents grossly ranging 
from 1.9 to 4.6. They also check hints for volatility clus- 
tering: absolute returns and squared returns exhibit a 
slow decay of autocorrelation, while raw returns do not. 
It thus appears that such a model can grossly fit some 
"stylized facts" . However, the number of parameters in- 
volved, as well as the complicated rules of transition be- 
tween agents, make clear identification of sources of phe- 
nomenons and calibration to market data difficult and 
in tractable. 

lAlfi" et al\ (|2009al lb[) provide a somewhat simplifying 
view on the Lux-Marchesi model. They clearly identify 
the fundamentalist behaviour, the chartist behaviour, the 
herding effect and the observation of the price by the 
agents as four essential effects of an agent-based finan- 
cial model. They show that the number of agents plays 
a crucial role in a Lux-Marchcsi-type model: more pre- 
cisely, the stylized facts arc reproduced only with a finite 
number of agents, not when the number of agents grows 
asymptotically, in which case the model stays in a fun- 
damentalist regime. There is a finite-size effect that may 
prove important for further studies. 

The role of the trend following mechanism in produc- 
non-trivia l features in price time series is also studied 
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Preis et "aZI ( 20071 ). The starting point is an order book 
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FIG. 36. Hurst exponent found in the Preis model for different 
number of agents when including random demand perturba- 
tion and dynamic limit order placement depth. Reproduced 
from lPreis al.\ (|2007f l. 



model similar to IChallet and Stinchcomb^ (l200l[ ) and 
ISmith et all ( 20031 ): at each time step, liquidity providers 
submit limit orders at rate A and liquidity takers sub- 
mit market orders at rate fj.. As expected, this zero- 
intelligence framework does not produce fat tails in the 
distribution of (log-)returns nor an over- diffusive Hurst 
exponent. Then, a stochastic link between order place- 
ment and market trend is added: it is assumed that liq- 
uidity providers observing a trend in the market will act 
consequently and submit limit orders at a wider depth in 
the order book. Although the assumption behind such 
a mechanism may not be empirically confirmed (a ques- 
tionable symmetry in order placement is assumed) and 
should be further discussed, it is interesting enough that 
it directly provides fat tails in the log-return distribu- 
tions and an over-diffusive Hurst exponent H 0.6 — 0.7 
for medium time-scales, as shown in figure [551 



3. Threshold behaviour 

We finally review a model focusing primarily on repro- 
ducing the stylized fact of volatility clustering, while most 
of the previous models we have r eview e d wer e mostly fo- 
cused on fat tails of log returns. IContI (|2007D proposes a 
model with a rather simple mechanism to create volatil- 
ity clustering. The idea is that volatility clustering char- 
acterizes several regimes of volatility (quite periods vs 
bursts of activity). Instead of implementing an exoge- 
nous change of regime, the author defines the following 
trading rules. 

At each period, an agent i € {1, . . . , N} can issue a buy 
or a sell order: (j)i{t) = ±1. Information is represented 
by a series of i.i.d Gaussian random variables, (et). This 
public information et is a forecast for the value rt+i of 
the return of the stock. Each agent i G {!,..., N} de- 



cides whether to follow this information according to a 
threshold Oi > representing its sensibility to the public 
information: 



.it) 




e,{t) > e,{t) 
\e^{t)\ < e.,{t) 
e,{t) < -e,{t) 



(29) 



Then, once every choice is made, the price evolves accord- 
ing to th e excess demand Dif) = T^ f^i 0i(O: ^ 
similar to lCont and BouchaudI (|2000f ). At the end of each 
time step t, threshold are asynchronously updated. Each 
agent has a probability s to update its threshold Oiit). 
In such a case, the new threshold 6*^(^-1-1) is defined to 
be the absolute value \rt\oi the return just observed. In 
short: 



i{t + 1) = l{u,(t)<s}ki| + \{ui(t)>s}Oi{t)- 



(30) 



The author shows that the time series simulated with 
such a model do exhibit some realistic facts on volatility. 
In particular, long range correlations of absolute returns 
is observed. The strength of this model is that it di- 
rectly links the state of the market with the decision of 
the trader. Such a feedback mechanism is essential in 
order to obtain non trivia l chara cteristics. Of course, the 
model presented in IContI ( 2007() is too simple to be fully 
calibrated on empirical data, but its mechanism could be 
used in a more elaborate agent-based model in order to 
reproduce the empirical evidence of volatility clustering. 



G. Remarks 

Let us attempt to make some concluding remarks 
on these developments of agent-based models for order 
books. In table IIIIl we summarize some key features of 
some of the order book models reviewed in this section. 
Among important elements for future modelling, we may 
mention the cancellation of orders, which is the less real- 
istic mechanism implemented in existing models ; the or- 
der book stability, which is always exogeno usly enforced 
(see our review of iMike and Farmeil ( 20081 ) a bove) ; and 
i :he de pendence between order fiows (see e.g. iMuni Tok3 
( 2010l ) and reference therein). Empirical estimation of 
these mechanisms is still challenging. 

Emphasis has been put in this section on order book 
modelling, a field that is at the crossroad of many larger 
disciplines (market microstructure, behavioural finance 
and physics). Market microstructure is essential since 
it defines in many ways the goal of the modelling. We 
pointed o ut th at it is not a coincidence if the work by 
iGarmanI dlOTj was published when computerization of 
exchanges was about to make the electronic order book 
the key of all trading. Regulatory issues that pushed 
early studies are still very important today. Realistic 
order book models could be a invaluable tool in testing 
and evaluating the effects of regulations such as the 2005 
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Regulation NM^ in the USA, or the 2007 MiFIEd in 
Europe. 



VIII. AGENT-BASED MODELLING FOR WEALTH 
DISTRIBUTIONS: KINETIC THEORY MODELS 

The distributions of money, wealth or income, i.e., how 
such quantities are shared among the population of a 
given country and among different countries, is a topic 
which has been studied by economists for a long time. 
The relevance of the topic to us is twofold: From the 
point of view of the science of Complex Systems, wealth 
distributions represent a unique example of a quantita- 
tive outcome of a collective behavior which can be di- 
rectly compared with the predictions of theoretical mod- 
els and numerical experiments. Also, there is a basic 
interest in wealth distributions from the social point of 
view, in particular in their degree of (in)equality. To this 
aim, the Gini coefficient (or the Gini index, if expressed 
as a percentage), developed by the Italian statistician 
Corrado Gini, represents a concept commonly employed 
to measure inequality of wealth distributions or, more 
in general, how uneven a given distribution is. For a 
cumulative distribution function F{y), that is piecewise 
differentiable, has a finite mean /.i, and is zero for y < 0, 
the Gini coefficient is defined as 

1 



G=l-- I dy{l~F{y)f 

= - [ dyF{y)il-F{y)). 
M Jo 



(31) 



It can also be interpreted statistically as half the relative 
mean difference. Thus the Gini coefficient is a number 
between and 1, where corresponds with perfect equal- 
ity (where everyone has the same income) and 1 corre- 
sponds with perfect inequality (where one person has all 
the income, and everyone else has zero income). Some 
values of G for some countries are listed in Table HVl 

Let us start by considering the basic economic quanti- 
ties: money, wealth and income. 



for" ffor details see lShostakI (|2000| )). Throughout history 
various commodities have been used as money, for these 
cases termed as "commodity money", which include for 
example rare seashells or beads, and cattle (such as cow 
in India). Recently, "commodity money" has been re- 
placed by other forms referred to as "fiat money" , which 
have gradually become the most common ones, such as 
metal coins and paper notes. Nowadays, other forms of 
money, such as electronic money, have become the most 
frequent form used to carry out transactions. In any case 
the most relevant points about money employed are its 
basic functions, which according to standard economic 
theory are 

• to serve as a medium of exchange, which is univer- 
sally accepted in trade for goods and services; 

• to act as a measure of value, making possible the 
determination of the prices and the calculation of 
costs, or profit and loss; 

• to serve as a standard of deferred payments, i.e., a 
tool for the payment of debt or the unit in which 
loans are made and future transactions are fixed; 

• to serve as a means of storing wealth not immedi- 
ately required for use. 

A related feature relevant for the present investigation is 
that money is the medium in which prices or values of all 
commodities as well as costs, profits, and transactions 
can be determined or expressed. Wealth is usually un- 
derstood as things that have economic utility (monetary 
value or value of exchange), or material goods or prop- 
erty; it also represents the abundance of objects of value 
(or riches) and the state of having accumulated these ob- 
jects; for our purpose, it is important to bear in mind 
that wealth can b e measured in t erms of money. Also 
income, defined in ICase and Faiil ( 20081 ) as "the sum of 
all the wages, salaries, profits, interests payments, rents 
and other forms of earnings received... in a given period 
of time" , is a quantity which can be measured in terms 
of money (per unit time). 



A. Money, wealth and income 

A common definition of money suggests that money is 
the "[cjommodity accepted by general consent as medium 
of economics exchange"^. In fact, money circulates from 
one economic agent (which can represent an individual, 
firm, country, etc.) to another, thus facilitating trade. It 
is "something which all other goods or services are traded 



B. Modelling wealth distributions 



^ National Market System 

® Markets in Financial Instruments Directive 
In Encyclopaedia Britannica. Retrieved June 17, 2010, from En- 
cyclopaedia Britannica Online 



It was first observed bv lParetd(|l897bD that in an econ- 
omy the higher end of the distribution of income f{x) 
follows a power-law. 



(32) 



with a, now known as the Pareto exponent, estimated 
by him to be a ~ 3/2. For the last hundred years 
the value of a ^ 3/2 seems to have changed little in 
time and across the various capitalist economies (see 
lYakovenko and Rosseil (j2009t ) and references therein). 
Gibrat ( 193lh clarified that Pareto's law is valid only 



for the high income range, whereas for the middle in- 
come range he suggested that the income distribution is 
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Model 


Stigler (1961) 


Carman (1976) 


Bak, Paczuski 
and Shubik 
(1997) 


Maslov (2000) 


Challet and 

Stinchcombe 

(2001) 


Mike and Farmer 
(2008) 
















Price 
range 


Finite grid 


Finite grid 


Finite grid 


Unconstrained 


Unconstrained 


Unconstrained 


Clock 


Trade time 


Physical Time 


Aggregated time 


Event time 


Aggregated time 


Aggregated time 


Flows / 
Agents 


One zero- 
intelligence agent 
/ One flow 


One zero- 
intelligence 
agent / Two flows 
(buy/sell) 


N agents owning 
each one limit or- 
der 


One zero- 
intelligence flow 
(limit order with 
fixed probability, 
else market order) 


One zero- 
intelligence agent 
/ One flow 


One zero- 
intelligence agent 
/ One flow 


Limit 
orders 


Uniform distribu- 
tion on the price 
grid 


Two Poisson pro- 
cesses for buy and 
sell orders 


Moving at each 
time step by one 
tick 


Uniformly dis- 
tributed in a 
finite interval 
around last price 


Normally dis- 
tributed around 
best quote 


Student- 
distributed 
around best 
quote 


Market 
orders 


Defined as cross- 
ing limit orders 


Defined as cross- 
ing limit orders 


Defined as cross- 
ing limit orders 


Submitted as such 


Defined as cross- 
ing limit orders 


Deflned as cross- 
ing limit orders 


Cancel- 
lation 
orders 


Pending orders 
arc cancelled after 
a fixed number of 
time steps 


None 


None (constant 
number of pend- 
ing orders) 


Pending orders 
arc cancelled after 
a fixed number of 
time steps 


Pending orders 
can be cancelled 
with fixed prob- 
ability at each 
time step 


Pending orders 
can be cancelled 
with 3-parameter 
conditional prob- 
ability at each 
time step 


Volume 


Unit 


Unit 


Unit 


Unit 


Unit 


Unit 


Order 
signs 


Independent 


Independent 


Independent 


Independent 


Independent 


Correlated with a 
fractional Brown- 
ian motion 
















Claimed 
results 


Return distribu- 
tion is power-law 
0.3 / Cut-off be- 
cause finite grid 


Microstructurc 
is responsible 
for negative 
correlation of 
consecutive price 
changes 


No fat tails for re- 
turns / Hurst ex- 
ponent 1/4 for 
price increments 


Fat tails for distri- 
butions of returns 
/ Hurst exponent 
1/4 


Hurst exponent 
1/4 for short time 
scales, tending 
to 1/2 for larger 
time scales 


Fat tails distribu- 
tions of returns 
/ Realistic spread 
distribution / Un- 
stable order book 



TABLE III. Summary of the characteristics of the reviewed hmit order book models. 



described by a log-normal probability density 
1 f log^(a;/xo)" 



: exp 



2o-2 



(33) 



where log(xo) = (log(x)) is the mean value of the loga- 
rithmic variable and = ([log(a;) — log(a;o)]^) the cor- 
responding variance. The factor ji = \/\/2a^, also know 
an as Gibrat index, measures the equality of the distri- 
bution. 

More recent empirical studies on income distribu- 
tion have been carried out by physic ists, e.g. those 
by iDragrilescu and Yakovenkol ( 2001bl jal) for UK and 



US, by iFuiiwara et al. 



iNirei and Souma (2007) for US and Japan. For an 



( 20031 ) for Japan, and by 

- .- and Jap a 

overview see lYakovenko and Rosseil (120091 ) ■ The distri- 
butions obtained have been shown to follow either the 
log-normal (Gamma like) or power-law types, depending 
on the range of wealth, as shown in Fig. 1371 

One of the current challenges is to write down the 
"microscopic equation" which governs the dynamics of 
the evolution of wealth distributions, possibly predict- 
ing the observed shape of wealth distributions, in- 
cluding the exponential law at intermediate values of 
wealth as well as the century-old Pareto law. To this 
aim, several studies have been made to investigate the 
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FIG. 37. Income distributions in the US (left)and 

Japan (right). Repr odu ced and adapted from 

IChakrabarti and Chatteried (|2003l ). available at 
arXiv : cond-mat/0302147. 



characteristics of the real income distribution and pro- 
vide the oretical rn o dels or explanations (see e. g. re- 
views bvlLuxl(l2005l).IChatteriee and Chakrabartil(|2007f ). 
lYakovenko and Rosseil ( 20091 )). 

The model of iGibratI { 193l[) and other models 
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TABLE IV. Gini indices (in percent) of some countries 
(from Human Development Indicators of the United Na- 
tions Human Development Report 2004, pp. 50-53, available at 
http : //hdr . undp . org/en/reports/global/hdr2004. More 
recent data are also available from their website.) 



Denmark 


24.7 


Japan 


24.9 


Sweden 


25.0 


Norway 


25.8 


Germany 


28.3 


India 


32.5 


France 


32.7 


Australia 


35.2 


UK 


36.0 


USA 


40.8 


Hong Kong 


43.4 


China 


44.7 


Russia 


45.6 


Mexico 


54.6 


Ghile 


57.1 


Brazil 


59.1 


South Africa 


59.3 


Botswana 


63.0 


Namibia 


70.7 



formulated in terms of a Langevin equation for a 
single wealth var i able, su bjected to multip l icativ e 
noise (iMandelbrotI (Il960l) : iLeyv and Solon^ (|l996[) : 
ISornettd (|l998l) : iBurda et al\ (|2003t) ). can lead to 
equilibrium wealth distributions with a power law tail, 
since they converge toward a log-normal distribution. 
However, the fit of real wealth distributions docs not 
turn out to be as good as that obtained using e.g. a 
F- or a /^-distributio n, in particii lar due to too large 
asymptotic variances ( Angle! ( 19861 )). Other models use 



a different approach and describe the wealth dynamics 
as a wealth flow due to exchanges between (pairs of) 
basic units. In this respect, such models are basically 
different from the class of models formulated in terms 
of a Langevin equation for a single we alth variable. 
For example, ISolomon and Levvl (|l996D studied the 
generalized Lotka-Volterra equ ations in relat i on to 
power-law wealth distribution. llsDolatov efd] (|l998l ) 
studied random exchange models of wealth distri- 
butions. Other models describing wealth ex change 
have been formulated using matrix theory (iGupta 
( 2006 )). the ma ster equation (iBouchaud and Mezard 

(120001): iDragulescu and Yakovenkol (|2000t) : 

iFerrerol (|2004|)). the Bolt z mann eq uation a pproach 



(IDragulescu and Yakovelikol (|2000|): ISlaninal ( 20041) 



Repetowicz et al\ (120051): ICordier et 



( 20071): iDiiring et 



Matthes and Toscani ( 2007 ): iDiiring and Toscanil 



20051) 



Scalas et al\ (|2006L l2007l) 



(l2008l)). or Marko v chai ns 
Garibaldi et~ai] (I2007D ). 



It should be mentioned that one of the earlie s t mod - 
elling efforts were made by IChampernownd ( 1953f ) . 



Since then many econo mists, iGabaixl (|l999f ) and 
iBenhabib and BisinI (|2009[) amongst others, have also 
studied mechanisms for power laws, and distributions of 
wealth. 

In the two following sections we consider in greater 
detail a class of models usually referred to as ki- 
netic wealth exchange models (KWEM), form ulated 
throug h finite t ime d ifference stochastic equations ("Angle 
1986L I2OO2I 



(2000): 



2002) 



2006): [Ch akraborti and Chakrabart 



Dragulescu and Yakovenkol (I2OOOI) : IChakraborti 



.Haved (120021)1 

Das and Yarlagaddal (1200 
(I2OO3I 1200 



Chatteriee et al. 



Scafetta et al. 



( 2003 ) 



20041) 



Ausloos and Pekalskil 



Iglesias et al. 

(I2007D ). From the studies carried out using wealth- 



exchange models, it emerges that it is possible to use 
them to generate power law distributions. 



C. Homogeneous kinetic wealth exchange models 

Here and in the next section we consider KWEMs, 
which are statistical models of closed economy. Their 
goal, rather then describing the market dynamics in 
terms of intelligent agents, is to predict the time evo- 
lution of the distribution of some main quantity, such 
as wealth, by studying the corresponding flow process 
among individuals. The underlying idea is that however 
complicated the detailed rules of wealth exchanges can 
be, their average behaviour can be described in a rel- 
atively more simple way and will share some universal 
properties with other transport processes, due to general 
conservation constraints and the effect of the fluctuations 
due to the environment or associated to the individual be- 
haviour. In this, there is a clear analogy with the general 
theory of transport phenomena (e.g. of energy). 

In these models the states of agents are defined in terms 
of the wealth variables {x„}, n = 1,2, . . . , N. The evo- 
lution of the system is carried out according to a trading 
rule between agents which, for obtaining the final equilib- 
rium distribution, can be interpreted as the actual time 
evolution of the agent states as well as a Monte Carlo 
optimization. The algorithm is based on a simple update 
rule performed at each time step t, when two agents i 
and j arc extracted randomly and an amount of wealth 
Aa; is exchanged. 



I 



Xi 



Ax, 
' Aa; . 



(34) 



Notice that the quantity x is conserved during single 
transactions, x\ -I- x'^ = Xi -\- Xj, where Xi = Xi{t) 
and Xj = Xj (t) are the agent wealth before, whereas 
x'^ = Xi{t -\- 1) and x'j = Xj{t -\- 1) are the final ones 
after the transaction. Several rules have been studied 
for the model defined by Eqs. (|34p . It is noteworthy, 
that though this theory has been originally derived from 
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the entropy maximization principle of statistical mechan- 
ics, it has recently been shown that the same could be 
derived from the utility maximization principle as well, 
following a standard exchange- mo del with Cobb-Douglas 
utility function (as explained later) , which bridge physics 
and economics together. 



1. Exchange models without saving 

In a s imple ve r sion o f KWE M considered in the 
works by Bennati (Il988a| bl. 19931) and also studied by 
iDragulescu and Yakovenkol ( 20001 ) the money difference 
Ax in Eqs. (|34|) is assumed to have a constant value, 
Aa; = Axq. Together with the constraint that transac- 
tions can take place only if > and x'j > 0, this leads 
to an equilibrium exponential distribution, see the curve 
for A = in Fig. [Ml 

rules were studied by 
(|2000[ ). choosing Ax as 
average money between the 



Various other tradin , 
IDragulescu and Yakovenb 
a random fraction of the 
two agents, Ax ~ e{x 



Xj)/2^ corresponding to a 



Ax = (1 — e)xi — exj in (|34p . or of the average money of 
the whole system, Ax = e{x). 

The models mentioned, as well as mo re complicated 
ones I Dragulescu and Yakovenkol ( 2000() ). lead to an 
equilibrium wealth distribution with an exponential tail 



fix) - ^exp(-/3x) , 



(35) 



with the effective temperature of the order of the 
average wealth, = (x). This result is largely inde- 
pendent of the details of the models, e.g. the multi-agent 
nature of the interaction, the initial conditions, and the 
random or consecutive order of extraction of the interact- 
ing agents. The Boltzmann distribution is characterized 
by a majority of poor agents and a few rich agents (due 
to the exponential tail), and has a Gini coefficient of 0.5. 



2. Exchange models with saving 

As a generalization and more realistic version of the 
basic e xchang e mo de ls , a saving criterion can be intro- 
duced. lAngld ( 19831 ). motivated by the surplus theory, 
introduced a unidirectional model of wealth exchange, in 
which only a fraction of wealth smaller than one can pass 
from one agent to the other, with a A.x ~ e.x,; or (—Luxj), 
where t he dire c tion of the f low is determined by the agent 
wealth (|Anglel (Il983lll986l) ). Later Angle introduced the 
One-Parameter Inequality Process (OPIP) where a con- 
stant fraction 1 — a; is saved before the transaction ( Angle! 
( 20021 )) by the agent whose wealth decreases, defined by 



an exchanged wealth amount Ax = oJXi or —loxj, again 
with the direction of the transaction determined by the 
relative difference between the agent wealth. 

A "saving parameter" < A < 1 representing the 
fraction of wealth saved, was introduced in the model 




FIG. 38. Probability density for wealth x. The curve for 
A = is the Boltzmann function f(x) = {x)~^ exp{—x/{x)) for 
the basic model of Sec. IVIII C l1 The other curves correspond 
to a global saving propensity A > 0, see Sec. IVIII C 2l 



by IChakraborti and Chakrabart 1 (I2000D . In this model 
(CC) wealth flows simultaneously toward and from each 
agent during a single transaction, the dynamics being 
defined by the equations 

x'j = Xxi + e(l — A)(.x,; + Xj) , 

x'j = Axj + (1 - e)(l - X){x^ +Xj), (36) 

or, equivalently, by a Ax in (1341) given by 



Ax = (1 — A)[(l — e)xi — eXj 



(37) 



These models, apart from the OPIP model of Angle 
which has the remarkable property of leading to a power 
law in a suitable range of w, can be well fitted by a F- 
distribution. The F-distribution is characterized by a 
mode x„i > 0, in agreemen t with real data of wealth 
and in co me dist r ibutio ns (IDragulescu and Yakovenkol 
(l2001aD: iFerrerol (|2004[): ISilva and Yakovenkol (l2005h : 



ISaia -1 Martin and Mohapatr 
(I2OO2I ): lAovama et all |200 



(|2002D : ISala-i MartiJ 
Furthermore, the limit 
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TABLE V. Analogy between kinetic the theory of gases and 
the kinetic exchange model of wealth 

Kinetic model Economy model 



100 



variable 
units 

interaction 
dimension 

temperature definition 
reduced variable 



K (kinetic energy) x (wealth) 
A'^ particles N agents 

collisions 
integer D 
kBT = 2{K)/D 
C = K/kBT 



trades 

real number D\ 

n = 2{x}/D^ 



equilibrium distribution /(^) = 7o/2(C) /(C) = 7Da/2(C) 



for small x is zero, i.e. P{x ^ 0) — s- 0, sec the exam- 
ple in Fig. l38l In the partic u lar ca se of the model by 
IChakraborti and Chakrabartil (|2000t) . the explicit distri- 
bution is well fitted by 



f{x) = n{x) ^jn{nx/{x)) 
1 n 



Tin) (x) \{x) 
3A 

- = 1 + - 

2 



cxp 



nx 



1 - A 



(38) 
(39) 



where 7^(0 is the standard F-distribution. This par- 
ticular functional form has been conjectured on th e base 
of the excell e nt fitting provided to nu merical data ( Angld 
(Il983l . ll986l) : iPatriarca efail (|2004bi la..2009.)). For more 
information and a comparison of s imila r fittings for dif- 
ferent m odels see Patriarca et al\ ( 2010| ). Very recently, 
iLal l ouache et al\ ( 2010l ) have shown using the distribu- 
tional form of the equation and moment calculations 
that strictly speaking the Gamma distribution is not 
th e solution of Eg. (I36L confirming the earlier results 
of iRepetowicz et all (|2005l) . However, the Gamma dis- 
tribution is a very very good approximation. 

The ubiquitous presence of F-functions in the solutions 
of kinetic models (see also below heterogeneous models) 
suggests a close analogy with kinetic theory of gases. In 
fact, interpreting D\ = 2n as an effective dimension, 
the variable x as kinetic energy, and introducing the ef- 
fective temperature f3~^ = T\ = {x)/2D\ according to 
the equipartition theorem, Eqs. (|38)) and (p9)) define the 
canonical distribution (3jn{Px) f or the kinetic energy o f 
a gas in D\ = 2n dimensions, see IPatriarca et al\ (|2004af ) 
for details. The analogy is illustrated in Table |V] and the 
dependences of D\ = 2n and of = T\ on the saving 
parameter A are shown in Fig. 1391 

The exponential distribution is recovered as a special 
case, for n = 1. In the limit A — >■ 1, i.e. for n — > oo, the 
distributi on f(x) above t ends to a Dirac (5-function, as 
shown in IPatriarca et all i 2004a[ l and qualitatively illus- 
trated by the curves in Fig. |38l This shows that a large 
saving criterion leads to a final state in which economic 
agents tend to have similar amounts of money and, in 
the limit of A — > 1, exactly the same amount (x). 

The equivalence between a kinetic wealth-exchange 
model with saving propensity A > and an A'^-particle 




FIG. 39. Effective dimension D\ and temperature T as a 
function of the saving parameter A. 



system in a space with dimension D\>2 is suggested by 
simple considerations about the kinetics of collision pro- 
cesses between two molecules. In one dimension, particles 
undergo head-on collisions in which the whole amount 
of kinetic energy can be exchanged. In a larger num- 
ber of dimensions the two particles will not travel in 
general exactly along the same line, in opposite verses, 
and only a fraction of the energy can be exchanged. 
It can be shown that during a binary elastic collision 
in D dimensions only a fraction 1/D of the total ki- 
netic ene rgy is exchanged on average for ki nematic rea- 
sons, see IChakraborti and" Patriarcal (|2008l) for details. 
The same 1/D dependence is in fact obtained inverting 
Eq. ([5^ . which provides for the fraction of exchanged 
wealth 1 - A = 6/{Dx + ^). 

Not all homogeneous models lead to distributions with 
an exponential taiL For instance, in the model studied in 
IChakrabortil(l2002D an agent i can lose all his wealth, thus 
becoming unable to trade again: after a sufficient number 
of transactions, only one trader survives in the market 
and owns the entire wealth. The equilibrium distribution 
has a very different shape, as explained below: 
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In the toy model it is assumed that both the economic 
agents i and j invest the same amount Xmin, which is 
taken as the minimum wealth between the two agents, 
Xmin = minjxi, Xj}. The wealth after the trade are x[ = 



Ax and x'j 



Ax, where Ax = (2e — 



We note that once an agent has lost all his wealth, he 
is unable to trade because Xmin has become zero. Thus, 
a trader is effectively driven out of the market once he 
loses all his wealth. In this way, after a sufficient number 
of transactions only one trader survives in the market 
with the entire amount of wealth, whereas the rest of the 
traders have zero wealth. In this toy model, only one 
agent has the entire money of the market and the rest 
of the traders have zero money, which corresponds to a 
distribution with Gini coefficient equal to unity. 

Now, a situation is said to be Parcto-optimal "if by 
reallocation you cannot make someone better off without 
making someone else worse off" . In Pareto's own words: 



"We will say that the members of a collectiv- 
ity enjoy maximum ophclimity in a certain 
position when it is impossible to find a way 
of moving from that position very slightly in 
such a manner that the ophclimity enjoyed 
by each of the individuals of that collectiv- 
ity increases or decreases. That is to say, any 
small displacement in departing from that po- 
sition necessarily has the effect of increasing 
the ophclimity which certain individuals en- 
joy, and decreasing that which others enjoy, 
of being agreeable to some, and disagreeable 
to others." 

— Vilfredo Pareto, Manual of Political Econ- 
omy (1906), p.261. 



However, as lSenI notes, an economy can be Pareto- 

optimal, yet still "perfectly disgusting" by any ethi- 
cal standards . It is important to note that Pareto- 
optimality, is merely a descriptive term, a property of an 
"allocation" , and there are no ethical propositions about 
the desirability of such allocations inherent within that 
notion. Thus, in other words there is nothing inherent in 
Parcto-optimality that implies the maximization of social 
welfare. 

This simple toy model thus also produces a Pareto- 
optimal state (it will be impossible to raise the well-being 
of anyone except the winner, i.e., the agent with all the 
money, and vice versa ) but the situation is economically 
undesirable as far as social welfare is concerned! 

Note also , as m entioned above, the OPIP model of 
lAngld ( 20061 [2003) ■ for example, depending on the model 
parameters, can also produce a power law tail. Another 
general way to produce a power law tail in the equilibrium 
distribution seems to diversify the agents, i.e. to consider 
heterogeneous models, discussed below. 
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FIG. 40. Results for randomly assigned saving parameters. 
Repro duced and adapted from IChakrabarti and Chatteried 
(120031 ). available at arXiv: cond-mat/0302147. 



D. Heterogeneous kinetic wealtli exchange models 

1. Random saving propensities 

The models considered above assume the all agents 
have the same statistical properties. The corresponding 
equilibrium wealth distribution has in most of the cases 
an exponential tail, a form which well interpolates real 
data at small and intermediate values of wealth. How- 
ever, it is possible to conceive generalized models which 
lead to even more realistic equilibrium wealth distribu- 
tions. This is the case when agents are diversified by 
assigning different valu es of the saving parameter. For 
instance, lAngld (|20Q2l ) studied a model with a trading 
rule where diversified parameters {uJi} occur, 



A, 



X = LJieXi 



(40) 



with the direction of wealth flow determined by the 
wealth of agents i and j. Diversified saving parame- 
ters were ind ependently introduced by IChatteriee et all 
(12003 . 2004) by ge neralizing the m odel introduced in 
IChakraborti and Ch akrabartl (|2000l) : 



XiXi + e[(l - Xi)x.i + {I - Xj)i 



x'j = \xj + (1 - e)[(l - \,)x, + (1 - \j)xj] , (41) 

corresponding to a 

Ax = (1 - e)(l - Xi)x, - e(l - A,>, . (42) 

The surprising result is that if the parameters {A^} are 
suitably diversified, a power law appears in the equilib- 
rium wealth distribution, see Fig. 1401 In particular if the 
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Ai are uniformly distributed in (0,1) the wealth distribu- 
tion exhibits a robust power-law tail, 



fix) oc X 



(43) 



with the Pareto exponent a = 1 largely independent of 
the details of the A-distribution. It may be noted that 
the exponent value unity is strictly for the tail end of 
the distribution and not for small values of the income 
or wealth (where the distribution remains exponential). 
Also, for finite number N of agents, there is always an 
exponential (in TV) cut off at the tail end of the distri- 
bution. This result is supported by independent theoret- 
ical considerations based on differ ent approaches, such 
as a mean field theory approach (jMohantvl (|2006h . see 
below for further de t ails) or the Bo ltzmann equation 
Das and Yarlagadda fe003'. '2 0051) : iRepetowicz et all 



20051 ): IChatteriee et al. (2005aD). For derivation of the 



Pareto law f rom variational principles, using t he KWEM 
context, see IChakraborti and Patriarcal ( 2009() . 



2. Power-law distribution as an overlap of Gamma 
distributions 

A remarkable feature of the equilibrium wealth dis- 
tri bution obtained from h eterogeneous models, noticed 
in IChatteriee et ~ai. 1 (120041 ). is that the individual wealth 
distribution fi (x) of the generic i-ih agent with saving pa- 
rameter Ai has a well defined mode and exponential tail, 
in spite of the resulting power-law tail of the marginal 
distrib ution f{x) =■ "Y^^fiix). In fact, iPatriarca et al\ 
(I2005D found by numerical simulation that the marginal 
distribution f{x) can be resolved as an overlap of individ- 
ual Gamma distributions with A-dependent parameters; 
furthermore, the mode and the average value of the distri- 
butions fi jx) both diverg e for A — » 1 as (x(X)) ^ 1 /(1 — A) 
(IChatteriee et all (|2004l ): IPatriarca etdlmO^ ). This 
fact was justified theoretically by Mohantvl ( 20061 ). Con- 
sider the evolution equations (HI]). In the mean field ap- 
proximation one can consider that each agents i has an 
(average) wealth (a;^) = yi and replace the random num- 
ber e with its average value (e) = 1/2. Indicating with 
Tjij the new wealth of agent z, due to the interaction with 
agent j, from Eqs. (|4ip one obtains 



= (1/2)(1 + A,)y. + (1/2)(1-A,)yj 



(44) 



At equilibrium, for consistency, average over all the in- 
teraction must give back y^. 



(45) 



Then summing Eq. (j44p over j and dividing by the num- 
ber of agents iV, one has 



(1 - K)v, = ((1 - X)y) 



(46) 



where ((1 — \)y) = ~ ^j)yj/^- Since the right 

hand side is independent of i and this relation holds for 
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FIG. 41. Wealth distribution in a system of 1000 agents 
with saving propensities uniformly distributed in the inter- 
val < A < 1. Top left: marginal distribution. Top right: 
marginal distribution (dotted line) and distributions of wealth 
of agents with A £ (jAA, (j + 1) AA), AA = 0.1, j = 0, . . . , 9 
(continuous lines). Bottom-left: the distribution of wealth of 
agents with A G (0.9, 1) has been further resolved into contri- 
butions from subintervals A £ (0.9 -|- jAA, 0.9 + (j + 1)AA), 
AA = 0.01. Bottom-right: the partial distribution of wealth 
of agents with A £ (0.99, 1) has been further resolved into 
those from subintervals A £ (0.99 -I- jAA, 0.99 -I- (j + 1)AA), 
AA = 0.001. 



arbitrary distributions of A,; , the solution is 

C 



l-A,; 



(47) 



where C is a constant. Besides proving the dependence 
of yi = (xi) on Ai, this relation also demonstrates the 
existence of a power law tail in the equilibrium distribu- 
tion. If, in the continuous limit, A is distributed in (0, 1) 
with a density (/)(A), (0 < A < 1), then using (gT]) the 
(average) wealth distribution is given 



/(?/)-0(A)^ = 
dy 



[l-C/x)^ 
V 



(48) 



Figure 21] illustrates the phenomenon for a system of 
N ~ 1000 agents with random saving propensities uni- 
formly distributed between and 1. The figure confirms 
the importance of agents with A close t o 1 for producing 
a pow e r-law probability distr ibution ( Chatteriee et all 
(|2004l ): IPatriarca et~al\ ^200^ ) . 

However, when considering values of A close enough to 
1, the power law can break down at least for two reasons. 
The first one, illustrated in Fig. HTI-bottom right, is that 
the power-law can be resolved into almost disjoint con- 
tributions representing the wealth distributions of single 
agents. This follows from the finite number of agents used 
and the fact that the distance between the average val- 
ues of the distributions corresponding to two consecutive 
values of A grows fast er than the corresponding widths 
( Patriarca et al ] (|2005l ): IChatteriee et al\ (|2005bt) ). The 
second reason is due to the finite cutoff Am, always 
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FIG. 42. Wealth distribution obtained for the uniform saving 
propensity distributions of 10^ agents in the interval (0, Am). 



present in a numerical simulation. However, to study 
this effect, one has to consider a system viritli a number 
of agents large enough that it is not possible to resolve the 
wealth distributions of single agents for t he sub-intervals 
of A c onsidered. This was done in by iPatriarca et all 
(I2006D using a system with = 10^ agents with sav- 
ing parameters distributed uniformly between and Am- 
Results are shown in Fig. 011 in which curves from left 
to right correspond to increasing values of the cutoff Am 
from 0.9 to 0.9997. The transition from an exponential 
to a power-law tail takes place continuously as the cut-off 
Am is increased beyond a critical value Am ~ 0.9 toward 
Am = 1, through the enlargement of the x-interval in 
which the power-law is observed. 



3. Relaxation process 

Relaxation in systems with constant A had already 
been studied by IChakraborti and Chakrabartil ( 2Q00l) . 
where a systematic increase of the relaxation time with 
A, and eventually a divergence for A — !• 1, was found. 
In fact, for A = 1 no exchanges occurs and the system 
is frozen. The relaxation ti me scale of a heteroge neous 
system had been studied bv IPatriarca et aL 1 (|2007t) . The 
system is observed to relax toward the same equilibrium 
wealth distribution from any given arbitrary initial dis- 
tribution of wealth. If time is measured by the number of 
transactions rit , the time scale is proportional to the num- 
ber of agents N, i.e. defining time t as the ratio t ~ rit/N 
between the number of trades and the total number of 
agents N (corresponding to one Monte Carlo cycle or one 
sweep in molecular dynamics simulations) the dynam- 
ics and the relaxation process become independent of N. 
The existence of a natural time scale independent of the 
system size provides a foundation for using simulations 
of systems with finite N in order to infer properties of 



systems with continuous saving propensity distributions 
and N ~^ oo. 

In a system with uniformly distributed A, the wealth 
distributions of each agent i with saving parameter A^ 
relaxe s toward different state s with characterist i c shape s 
f,(x) (IPatriarca et aLl (|20Q5|): IChatteriee et al\ (|2005bf ): 
lir'atriarca et al\ '{ 200^ )) w ith different relaxation times 
Ti (Patri arca et al. r iooTh i The differences in the re- 
laxation process can be related to the different relative 
wealth exchange rates, that by direct inspection of the 
evolution equations appear to be proportional to 1 — A^. 
Thus, in general, higher saving propensities are expected 
to be associated to slower relaxation processes with a 
relaxation time cx 1/(1 — A). 

It is also possible to obtain the relaxation time distri- 
bution. If the saving parameters are distributed in (0, 1) 
with a density (/)(A), it follows from probability conser- 
vation that f{x)dx = 4>{\)d\, where x = {x)\ and f{x) 
the corresponding density of average wealth values. In 
the case of uniformly distributed saving propensities, one 
obtains 



dXjx) 
dx 



(49) 



showing that a uniform saving propensity distribution 
leads to a power law f{x) ^ 1/x^ in the (average) wealth 
distribution. In a similar way it is possible to obtain the 
associated distribution of relaxation times ^^{t) for the 
global relaxation process from the relation Ti oc 1/(1— A^), 



T \ T' , , 

- W ' (50) 



where r' is a proportionality factor. Therefore '4'{t) and 
f{x) are characterized by power law tails in r and x re- 
spectively with the same Pareto exponent. 

In conclusion, the role of the A-cut-off is also related to 
the relaxation process. This means that the slowest con- 
vergence rate is determined by the cut-off and is cx 1 — Am. 
In numerical simulations of heterogeneous KWEMs, as 
well as in real wealth distributions, the cu t-off is necessa r- 
ily finite, so that the convergence is fast (iGuDtal (I2008D '). 
On the other hand, if considering a hypothetical wealth 
distribution with a power law extending to infinite values 
of X, one cannot find a fast relaxation, due to the infinite 
time scale of the system, due to the agents with A = 1. 



E. Microeconomic formulation of Kinetic theory models 



Very recently, IChakrabarti and" Chakrabartil (I2009D 
have studied the framework based on microeconomic the- 
ory from which the kinetic theory market models could 
be addressed. They derived the mome nts of the model 
bv IChakraborti and Chakrabartil (|2Q00t ) and reproduced 
the exchange equations used in the model (with fixed 
savings parameter). In the framework considered, the 
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utility function deals with the behaviour of the agents in 
an exchange economy. 

They start by considering two exchange economy, 
where each agent produces a single perishable commod- 
ity. Each of these goods is different and money exists 
in the economy to simply facilitate transactions. Each of 
these agents are endowed with an initial amount of money 
Ml = mi{t) and M2 = m2(t). Let agent 1 produce Qi 
amount of commodity 1 only, and agent 2 produce Q2 
amount of commodity 2 only. At each time step t, two 
agents meet randomly to carry out transactions accord- 
ing to their utility maximization principle. 

The utility functions as defined as follows: For 
agent 1, Ui(xi,X2,rni) = Xi^x'^^m"'" and for agent 2, 
^^2(2/1, J/2, "^2) — y"^2/2^"^2" where the arguments in 
both of the utility functions are consumption of the first 
(i.e. xi and yi) and second good (i.e. X2 and 1/2) and 
amount of money in their possession respectively. For 
simplicity, they assume that the utility functions are of 
the above Cobb-Douglas form with the sum of the powers 
normalized to 1 i.e. ai + a2 + am = 1- 

Let the commodity prices to be determined in the mar- 
ket be denoted by pi and p2- Now, the budget con- 
straints are as follows: For agent 1 the budget constraint 
is piXi +P2X2 +mi < All +P1Q1 and similarly, for agent 
2 the constraint is piyi +P2y2 +in2 < M2 +P2Q2, which 
mean that the amount that agent 1 can spend for con- 
suming xi and X2 added to the amount of money that he 
holds after trading at time t -\- 1 (i.e. nii) cannot exceed 
the amount of money that he has at time t (i.e. Mi) 
added to what he earns by selling the good he produces 
(i.e. Qi), and the same is true for agent 2. 

Then the basic idea is that both of the agents try to 
maximize their respective utility subject to their respec- 
tive budget constraints and the invisible hand of the mar- 
ket that is the price mechanism works to clear the market 
for both goods (i.e. total demand equals total supply for 
both goods at the equilibrium prices), which means that 
agent I's problem is to maximize his utility subject to 
his budget constraint i.e. maximize Ui{xi,X2-,mi) sub- 
ject to pi-Xi + P2-X2 + mi — Ml + pi.Qi. Similarly 
for agent 2, the problem is to maximize ?/i(2/i, 1/27 ^2) 
subject to pi-yi + P2-y2 + m2 = M2 + P2-Q2- Solv- 
ing those two maximization exercises by Lagrange multi- 
plier and applying the condition that the market remains 
in equilibrium, the competitive price vector {pi,P2) as 
Vi = {ai/am){Mi + M2)/Qi for i = 1, 2 is found 



( Chakrabarti and Chakrabartil (|2009l) ). 

The outcomes of such a trading process are then: 

1. At optimal prices (pi,p2), fni(t) +m2{t) ~ mi{t + 
1) -|- m2{t + 1), i.e., demand matches supply in all 
market at the market-determined price in equilib- 
rium. Since money is also treated as a commod- 
ity in this framework, its demand (i.e. the total 
amount of money held by the two persons after 
trade) must be equal to what was supplied (i.e. the 
total amount of money held by them before trade). 



2. If a restrictive assumption is made such that ai 
in the utility function can vary randomly over time 
with am remaining constant. It readily follows that 
a2 also varies randomly over time with the restric- 
tion that the sum of ai and a2 is a constant (l-a,„). 
Then in the money demand equations derived, if we 
suppose a„i is A and ai/{ai -I- a2) is e, it is found 
that money evolution equations become 

mi{t + 1) = \mi{t) + e(l - X)imi{t) + 7712(1)) 
77i2{t + 1) = Am2(<) + (1 - e)(l - \){mi{t) + 77i2{t)). 

For a fixed value of A, if ai (or 02) is a ran- 
dom variable with uniform distribution over the 
domain [0,1 — A], then e is also uniformly dis- 
tributed over the domain [0,1]. This limit corre- 
sponds to the lChakraborti and Chakrabartil (|2000[ ) 
model, discussed earlier. 

3. For the limiting value of a™ in the utility function 
(i.e. ara which implies A — > 0), the money 
transfer equation describing the random sharing 
of money without saving is obtained , whic h was 
studied bv iDragulescu and Yakovenkol ( 2000t) men- 
tioned earlier. 

This actually demonstrates the equivalence of the two 
maximizations principles of entropy (in physics) and util- 
ity (in economics), and is certainly noteworthy. 



IX. AGENT-BASED MODELLING BASED ON GAMES 



A. Minority Game models 



1. El Parol Bar Problem 



lArthuij ( 1994) introduced the 'El Farol Bar' problem as 
a paradigm of complex economic systems. In this prob- 
lem, a population of agents have to decide whether to go 
to the bar opposite Santa Fe, every Thursday night. Due 
to a limited number of seats, the bar cannot entertain 
more than X% of the population. If less than X% of the 
population go to the bar, the time spent in the bar is 
considered to be satisfying and it is better to attend the 
bar rather than staying at home. But if more than X% 
of the population go to the bar, then it is too crowded 
and people in the bar have an unsatisfying time. In this 
second case, staying at home is considered to be better 
choice than attending the bar. So, in order to optimise 
its own utility, each agent has to predict what everybody 
else will do. 

In particular Arthur was also interested in agents who 
have bounds on "rationality" , i.e. agents who: 

• do not have perfect information about their envi- 
ronment, in general they will only acquire infor- 
mation through interaction with the dynamically 
changing environment; 
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• do not have a perfect model of their environment; 

• have hmited computational power, so they can't 
work out all the logical consequences of their knowl- 
edge; 

• have other resource limitations (e.g. memory). 

In order to take these limitations into account, each agent 
is randomly given a fixed menu of models potentially suit- 
able to predict the number of people who will go the bar 
given past data (e.g. the same as two weeks ago, the av- 
erage of the past few weeks, etc.). Each week, each agent 
evaluates these models against the past data. He chooses 
the one that was the best predictor on this data and then 
uses it to predict the number of people who will go to the 
bar this time. If this prediction is less than X, then the 
agent decides to go to the bar as well. If its prediction 
is more than X, the agent stays home. Thus, in order to 
make decisions on whether to attend the bar, all the indi- 
viduals are equipped with certain number of "strategies" , 
which provide them the predictions of the attendance in 
the bar next week, based on the attendance in the past 
few weeks. As a result the number who go to the bar 
oscillates in an apparently random manner around the 
critical X% mark. 

This was one of the first models that led a way different 
from traditional economics. 



2. Basic Minority game 

The Min o rity Games (abbreviated MGs) 
( Challet et al\ ( 2004[ )) refer to the multi- agent models 
of financial markets with the original formulation 
introduce d by Challet and Zhang (fl997i) . and all other 
variants (|Coolenl (|20Q5l) : iLamper al\ (|20Q2f) ). most 
of which share the principal features that the models 
are repeated games and agents are inductive in nature. 
The original form ul ation of the Minority Game by 
IChallet and Zhand (I1997D is sometimes referred as 
the "Original Minority Game" or the "Basic Minority 
Game". 

The basic minority game consists of N (odd natural 
number) agents, who choose between one of the two de- 
cisions at each round of the game, using their own sim- 
ple inductive strategies. The two decisions could be, for 
example, "buying" or "selling" commodities/assets, de- 
noted by or 1, at a given time t. An agent wins the 
game if it is one of the members of the minority group, 
and thus at each round, the minority group of agents 
win the game and rewards are given to those strategies 
that predict the winning side. All the agents have ac- 
cess to finite amount of public information, which is a 
common bit-string "memory" of the M most recent out- 
comes, composed of the winning sides in the past few 
rounds. Thus the agents with finite m e mory are said to 
exhibit "bounded rationality" (jArthuil (|l994l) ). 
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FIG. 43. Attendance fluctuation and performances of players 
in Basic Minority Game. Plots of (a) attendance and (b) 
performance of the players (five curves are: the best, the worst 
and three randomly chosen) for the basic minority game with 
N = 801; M = 6; fc 10 and T = 5000. Reproduced from 
ISvsi-Aho etaZi (|2003bD . 



Consider for example, memory M = 2; then there are 
P = 2^^ = 4 possible "history" bit strings: 00, 01, 10 
and 11. A "strategy" consists of a response, i.e., or 1, 
to each possible history bit strings; therefore, there are 
G = 2^ = 2^^ =16 possible strategies which consti- 
tute the "strategy space" . At the beginning of the game, 
each agent randomly picks k strategies, and after the 
game, assigns one "virtual" point to a strategy which 
would have predicted the correct outcome. The actual 
performance r of the player is measured by the number 
of times the player wins, and the strategy, using which 
the player wins, gets a "real" point. A record of the 
number of agents who have chosen a particular action, 
say, "selling" denoted by 1, Ai{t) as a function of time 
is kept (see Fig. US]). The fluctuations in the behaviour 
of Ai {t) actually indicate the system's total utility. For 
example, we can have a situation where only one player 
is in the minority and all the other players lose. The 
other extreme case is when (N — l)/2 players are in the 
minority and {N + l)/2 players lose. The total utility 
of the system is obviously greater for the latter case and 
from this perspective, the latter situation is more desir- 
able. Therefore, the system is more efficient when there 
are smaller fluctuations around the mean than when the 
fluctuations are larger. 

As in the El Farol bar problem, unlike most traditional 
economics models which assume agents are "deductive" 
in nature, here too a "trial-and-error" inductive thinking 
approach is implicitly implemented in process of decision- 
making when agents make their choices in the games. 
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FIG. 44. Temporal attendance of A for the genetic ap- 
proach showing a learning process. Reproduced from 
IChallet and Zhand (|1997D 



3. Evolutionary minority games 

Challet generali z ed the ba sic minority game (sec 
IChallet and Zhand (Il997l . Il998h ) mentioned above to in- 
clude the Darwinian selection: the worst player is re- 
placed by a new one after some time steps, the new player 
is a "clone" of the best player, i.e. it inherits all the 
strategies but with corresponding virtual capitals reset 
to zero (analogous to a new born baby, though having 
all the predispositions from the parents, docs not inherit 
their knowledge) . To keep a certain diversity they intro- 
duced a mutation possibility in cloning. They allowed 
one of the strategies of the best player to be replaced by 
a new one. Since strategies are not just recycled among 
the players any more, the whole strategy phase space is 
available for selection. They expected this population to 
be capable of "learning" since bad players are weeded 
out with time, and fighting is among the so-to-spcak the 
"best" players. Indeed in Fig. |44l they observed that the 
learning emerged in time. Fluctuations are reduced and 
saturated, this implies the average gain for everybody is 
improved but never reaches the ideal limit. 

iLi et al\ (I2000allbh also studied the minority game in 
the presence of "evolution" . In particular, they exam- 
ined the behaviour in games in which the dimension of 
the strategy space, m, is the same for all agents and fixed 
for all time. They found that for all values of m, not too 
large, evolution results in a substantial improvement in 
overall system performance. They also showed that after 
evolution, results obeyed a scaling relation among games 
played with different values of m and different numbers of 
agents, analogous to that found in the non-evolutionary, 
adaptive games (see remarks on section IIX A 5|) . Best 



system performance still occurred, for a given number of 
agents, at TOc, the same value of the dimension of the 
strategy space as in the non-evolutionary case, but sys- 
tem performance was nearly an order of magnitude bet- 
ter than the non-evolutionary result. For m < rric, the 
system evolved to states in which average agent wealth 
was better than in the random choice game. As m be- 
came large, overall systems performance approached that 
o f the random cho ice game. 

iLi et al\ ( 2000al lbl) continued the study of evolution in 
minority games by examining games in which agents with 
poorly performing strategies can trade in their strategies 
for new ones from a different strategy space, which meant 
allowing for strategies that use information from different 
numbers of time lags, m. They found, in all the games, 
that after evolution, wealth per agent is high for agents 
with strategies drawn from small strategy spaces (small 
m), and low for agents with strategies drawn from large 
strategy spaces (large m). In the game played with 
agents, wealth per agent as a function of m was very 
nearly a step function. The transition was found to be 
at TO = TOt, where toj ~ TOc — 1, and TOc is the critical 
value of TO at which N agents playing the game with 
a fixed strategy space (fixed to) have the best emer- 
gent coordination and the best utilization of resources. 
They also found that overall system-wide utilization of 
resources is independent of A^. Furthermore, although 
overall system-wide utilization of resources after evolu- 
tion varied somewhat depending on some other aspects of 
the evolutionary dynamics, in the best cases, utilization 
of resources was on the order of the best results achieved 
in evolutionary games with fixed strategy spaces. 



4. Adaptive minority games 



ISvsi-Aho etd\ (|2003al l3lbl. l2004t ) presented a simple 
modification of the basic minority game where the play- 
ers modify their strategies periodically after every time 
interval r, depending on their performances: if a player 
finds that he is among the fraction n (where < n < 1) 
who arc the worst performing players, he adapts him- 
self and modifies his strategies. They proposed that the 
agents use hybridized one-point genetic crossover mecha- 
nism (as shown in Fig. I45p . inspired by genetic evolution 
in biology, to modify the strategies and replace the bad 
strategies. They studied the performances of the agents 
under different conditions and investigate how they adapt 
themselves in order to survive or be the best, by find- 
ing new strategies using the highly effective mechanism. 
They also studied the measure of total utility of the sys- 
tem U{xt), which is the number of players in the minority 
group; the total utility of the system is maximum Umax as 
the highest number of players win is equal to (A^ — l)/2. 
The system is more efficient when the deviations from 
the maximum total utility Umax sltc smaller, or in other 
words, the fluctuations in Ai{t) around the mean become 
smaller. 
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FIG. 45. Schematic diagram to illustrate the mechanism of 
one-point genetic crossover for producing new strategies. The 
strategies Si and sj are the parents. We choose the breaking 
point randomly and through this one-point genetic crossover, 
the children Sk and si are produced and subs titute the par- 
ents. Reproduced from ISvsi- Aho et'al] (|2003bl ) . 
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FIG. 46. Plot to show the time variations of the number of 
players Ai who choose action 1, with the parameters A'^ — 
1001, ?n = 5, s = 10 and t = 4000 for (a) basic minority 
game and (b) ad aptive game, where r = 25 and n = 0.6. 
Reproduced from ISvsi- Aho et al\ (|2003lJ ). 



Interestingly, the fluctuations disappear totally and 
the system stabilizes to a state where the total utility 
of the system is at maximum, since at each time step the 
highest number of players win the game (see Fig. 146]) . 
As expected, the behavi our depends on the paramet er 
values for the system fsee lSvsi-Aho et al\ (|2003bll2004) ). 
They used the utility function to study the efficiency and 
dynamics of the game as shown in Fig. 1471 If the par- 
ents are chosen randomly from the pool of strategies then 
the mechanism represents a "one-point genetic crossover" 
and if the parents arc the best strategics then the mech- 
anism represents a "hybridized genetic crossover" . The 
children may replace parents or two worst strategies and 
accordingly four different interesting cases arise: (a) one- 
point genetic crossover with parents "killed", i.e. par- 
ents are replaced by the children, (b) one-point genetic 
crossover with parents "saved" , i.e. the two worst strate- 
gies are replaced by the children but the parents arc 
retained, (c) hybridized genetic crossover with parents 
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FIG. 47. Plot to show the variation of total utility of the 
system with time for the basic minority game for A'^ = 1001, 
m = 5,s = 10,t = 5000, and adaptive game, for the same 
parameters but different values of r and n. Each point rep- 
resents a time average of the total utility for separate bins of 
size 50 time-steps of the game. The maximum total utility 
(= (A'^ — l)/2) is shown as a dashed line. The data for the 
basic minority game is shown in circles. The plus signs are 
for r = 10 and n = 0.6; the asterisk marks are for t = 50 an 
n — 0.6; the cross marks for r = 10 and n — 0.2 and trian- 
gles for T = 50 and n = 0.2. The ensemble average over 70 
different sampl e s was taken in each case. Reproduced from 
ISvsi-Aho etai] (|2003bl ). 



"killed" and (d) hybridized genetic crossover with par- 
ents "saved". 

In order to determine which mechanism is the most 
efficient, we have made a comparative study of the four 
cases, mentioned above. We plot the attendance as a 
function of time for the different mechanisms in Fig. US] 
In Fig. Iiniwe show the total utility of the system in each 
of the cases (a)-(d), where we have plotted results of the 
average over 100 runs and each point in the utility curve 
represents a time average taken over a bin of length 50 
time-steps. The simulation time is doubled from those 
in Fig. I48[ in order to expose the asymptotic behaviour 
better. On the basis of Figs. |35] and SHI we find that 
the case (d) is the most efficient. In order to investi- 
gate what happens in the level of an individual agent, 
we created a competitive surrounding- "test" situation 
where after T — 3120 time-steps, six players begin to 
adapt and modify their strategies such that three are us- 
ing hybridized genetic crossover mechanism and the other 
three one point genetic crossover, where children replace 
the parents. The rest of the players play the basic mi- 
nority game. In this case it turns out that in the end 
the best players are those who use the hybridized mech- 
anism, second best are those using the one-point mecha- 
nism, and the bad players those who do not adapt at all. 
In addition it turns out that the competition amongst the 
players who adapt using the hybridized genetic crossover 
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FIG. 48. Plots of the attendances by choosing parents ran- 
domly (a) and (b), and using the best parents in a player's 
pool (c) and (d). In (a) and (c) case parents are replaced 
by children and in (b) and (d) case children replace the two 
worst strategies. Simulations have been done with N = 801, 
Af = 6, A: = 16, t = 40, n = 0.4 and T = 10000. 
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FIG. 49. Plots of the scaled utilities of the four different 
mechanisms in comparison with that of the basic minority 
game. Each curve represents an ensemble average over 100 
runs and each point in a curve is a time average over a bin 
of length 50 time-steps. In the inset, the quantity {1 — U) is 
plotted against scaled time in the double logarithmic scale. 
Simulations are done with iV = 801, M = 6, k = 16, t = 40, 
n = 0. 4 and T = 20000. Reproduced from ISvsi-Aho eraU 
(|2003bf ). 



mechanism is severe. 

It should be noted that the mechanism of evolution of 
strategi es is considerably d i fferen t fr om earlie r attempts 
such as IChallet and Zhand ^99^ or lLi et al\ (|2000al |bh. 
This is because in this mechanism the strategies are 
changed by the agents themselves and even though the 
strategy space evolves continuously, its size and dimen- 
sionality remain the same. 



Due to t he siin plicity of these models (|Svsi-Aho et al\ 
(I2003al l3lbl. 120041) 1 a lot of freedom is found in modi- 
fying the models to make the situations more realistic 
and applicable to many real dynamical systems, and not 
only financial markets. Many details in the model can 
be fine-tuned to imitate the real markets or behaviour of 
other complex systems. Many other sophisticated mod- 
els based on these games can be setup and implemented, 
which show a great potential over the commonly adopted 
statistical techniques in analyses of financial markets. 



5. Remarks 

For modelling purposes, the minority game mod- 
els were meant to serve as a class of simple models 
which could produce some macroscopic features observed 
in the real financial markets, which included the fat- 
tail price return dis tr ibution and v olatility clustering 
(|Challet et d] t004: ICoolenI (|2005|)V De spite the hec- 
tic ac tivity (|Challet and Zhand (|l998[ ): IChallet et aZI 
( 2000f l) they have failed to capture or reproduce most 
important stylized facts of the real markets. However, in 
the physicists' community, they have become an interest- 
ing and established class of models where the physics of 
disord ered systems ( Cavagna et al\ ( 19991 ): IChallet et al\ 
2OOOI)'). lending a large amount of ph ysical insights 



Savit et all (|l999f) : iMartino et d\ (|2004[ )). Since in the 



BMG model a Hamiltonian function could be defined and 
analytic solutions could be developed in some regimes of 
the model, the model was viewed with a more physical 
picture. In fact, it is characterized by a clear two-phase 
structure with very different collective behaviours in the 
two p hases, as in many k n own conventional phy sical sys- 
te ms (ISavit et all t99^ : ICavagna et~ail (|l999t )V 

ISavit et al. I (I1999I) first found that the macroscopic be- 
haviour of the system does not depend independently on 
the parameters N and M, but instead depends on the 
ratio 



2M 



p 

N 



(51) 



which serves as the most important control parameter 
in the game. T he variance in the attendance (see also 
ISvsi-Aho ercdl C2003c)) or volatility a^/N, for different 
values of N and M depend only on the ratio a. Fig. [501 
shows a plot of a'^/N against the control parameter a, 
where the data collapse of a'^/N for different values of 
N and M is clearly evident. The dotted line in Fig. [SO] 
corresponds to the "coin-toss" limit (random choice or 
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FIG. 50. The simulation results of the variance in attendance 
/N as a function of the control parameter a — 2^' /N for 
games with k = 2 strategies for each agent, ensemble averaged 
over 100 sample runs. Dotted line shows the value of volatility 
in random choice limit. Solid line shows the critical value 
of a = Qfc ~ 0.3374. Reproduced from Yeung and Zhang 
arxiv: 0811. 1479. 



pure chance limit), in which agents play by simply mak- 
ing random decisions (by coin-tossing) at every rounds 
of the game. This value of a'^/N in coin-toss limit can 
be obtained by simply assuming a binomial distribution 
of the agents' binary actions, with probability 0.5, such 
that a^/N = 0.5(1 - 0.5) -4 = 1. When a is small, the 
value of /N of the game is larger than the coin-toss 
limit which implies the collective behaviours of agents are 
worse than the random choices. In the early literature, it 
was popularly called as the worse-than-random regime. 
When a increases, the value of a"^ /N decreases and en- 
ters a region where agents are performing better than 
the random choices, which was popularly called as the 
better-than-random regime. The value of jN reaches 
a minimum value which is substantially smaller than the 
coin-toss limit. When a further increases, the value of 
jN increases again and approaches the coin-toss limit. 
This allowed one to identify two phases in the Minority 
Game, as separated by the minimum value of cr^/TV in 
the graph. The value of a where the rescaled volatility 
attended its minimum was denoted by ac, which repre- 
sented the phase transition point; cxc has been shown to 
have a value of 0.337 4 . . . (f or fc = 2) by analytical calcu- 
lations (GhaileriLall (|2000l) . 



Besides these collective behaviours, physicists became 
also interested in the dynamics of the games such as 
crowd vs a nti-crowd movement of age nts, periodi c attra c- 
tors, etc. ( Johnson et al\ ( 1999bl lah: iHart et ~ai. 1 (l200lh ). 
In this way, the Minority Games serve as a useful tool 
and provide a new direction for physicists in viewing and 
analysing the underlying dynamics of complex evolving 
systems such as the financial markets. 



The KPR problern (IChakrabarti et al\ (l2009l) : 
iGhosh and Chakrabartil (I2009D : iGhosh et al\ (l2010allbr i'l 

is a repeated game, played between a large number N 
of agents having no interaction amongst themselves. In 
KPR problem, prospective customers (agents) choose 
from iV restaurants each evening simultaneously (in 
parallel decision mode); N is fixed. Each restaurant 
has the same price for a meal but a different rank 
(agreed upon by all customers) and can serve only 
one customer any evening. Information regarding the 
customer distributions for earlier evenings is available 
to everyone. Each customer's objective is to go to the 
restaurant with the highest possible rank while avoiding 
the crowd so as to be able to get dinner there. If more 
than one customer arrives at any restaurant on any 
evening, one of them is randomly chosen (each of them 
are anonymously treated) and is served. The rest do not 
get dinner that evening. 

In Kolkata, there were very cheap and fixed rate "Paise 
Restaurants" that were popular among the daily labour- 
ers in the city. During lunch hours, the labourers used to 
walk (to save the transport costs) to one of these restau- 
rants and would miss lunch if they got to a restaurant 
where there were too many customers. Walking down to 
the next restaurant would mean failing to report back to 
work on time! Paise is the smallest Indian coin and there 
were indeed some well-known rankings of these restau- 
rants, as some of them would offer tastier items compared 
to the others. A more general example of such a problem 
would be when the society provides hospitals (and beds) 
in every locality but the local patients go to hospitals 
of better rank (commonly perceived) elsewhere, thereby 
competing with the local patients of those hospitals. Un- 
availability of treatment in time may be considered as 
lack of the service for those people and consequently as 
(social) wastage of service by those unattended hospitals. 

A dictator's solution to the KPR problem is the follow- 
ing: the dictator asks everyone to form a queue and then 
assigns each one a restaurant with rank matching the se- 
quence of the person in the queue on the first evening. 
Then each person is told to go to the next ranked restau- 
rant in the following evening (for the person in the last 
ranked restaurant this means going to the first ranked 
restaurant). This shift proceeds then continuously for 
successive evenings. This is clearly one of the most effi- 
cient solution (with utilization fraction / of the services 
by the restaurants equal to unity) and the system arrives 
at this this solution immediately (from the first evening 
itself). However, in reality this cannot be the true solu- 
tion of the KPR problem, where each agent decides on his 
own (in parallel or democratically) every evening, based 
on complete information about past events. In this game, 
the customers try to evolve a learning strategy to even- 
tually get dinners at the best possible ranked restaurant, 
avoiding the crowd. It is seen, the evolution these strate- 
gies take considerable time to converge and even then the 
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eventual utilization fraction / is far below unity. 

Let the symmetric stochastic strategy chosen by each 
agent be such that at any time t, the probability pk (t) to 
arrive at the k-th ranked restaurant is given by 



Pkit) 



1 



z 

N r 



k" exp 



E 

k=l 



fc" cxp - 



nk{t-l) 
T 

nk{t-l) 
T 



(52) 



where Uk (t) denotes the number of agents arriving at the 
fc-th ranked restaurant in period t, T > is a scaling 
factor and a > is an exponent. 

For any natural number a and T — >■ oo, an agent goes 
to the fc-th ranked restaurant with probability Pkit) = 
fc"/ ^ fc"; which means in the limit T — >■ oo in (j52p gives 

If an agent selects any restaurant with equal probabil- 
ity p then probability that a single restaurant is chosen 
by m agents is given by 



A(m) 



N 



N- 



(53) 



Therefore, the probability that a restaurant with rank k 
is not chosen by any of the agents will be given by 



A.(m = 0)=(^^j(l-p.)^; Pk = ^ 

exp ( — — ) as iV oo, (54) 



N 



where N = J2k=i - k°'dk ^ Hence 

Afe(,n = 0)=exp(^ (55) 

Therefore the average fraction of agents getting dinner in 
the fc-th ranked restaurant is given by 



/fc = 1 - Afc (m = 0) , 



(56) 



Naturally for a = 0, the problem corresponding to 
random choice /fc = 1 - giving J =Y1 Ik/N_c:^ 0.63 
and for a = 1, A- = 1 - e'^^/^ giving / = ^ fk/N ~ 
0.58. 

In summary, in the KPR problem where the decision 
made by each agent in each evening t is independent 
and is based on the information about the rank k of 
the restaurants and their occupancy given by the num- 
bers nk{t — l)...nfc(0). For several stochastic strate- 
gies, only nk{t— 1) is utilized and each agent chooses the 
fc-th ranked restaurant with probability Pkit) given by 
Eq. ((52)) . The utilization fraction fk of the fc-th ranked 
restaurants on every evening is studied and their aver- 
age (over fc) distributions D{f) are st udied numerically, 
as well as analytically, and one finds ( Chakrabarti et all 



20091): iGhosh and Chakrabartil (jloO^); iGhosh et ai\ 
2010al )) their distributions to be Gaussian with the most 
probable utilization fraction / ~ 0.63, 0.58 and 0.46 for 
the cases with a = 0, T— )-oo;a = l,r— >-oo; and a = 0, 
T — >■ respectively. For the stochastic crowd-avoiding 



strategy discussed in lGhosh et al\ ( 2010bl) . where pk{t - 
1) = for k = kg the restaurant visited by the agent 
last evening, and = 1/{N — 1) for all other restaurants 
(fc 7^ fco), one gets the best utilization fraction / ~ 0.8, 
and the analytical estimates for / in these limits agree 
very well with the numerical observations. Also, the time 
required to converge to the above value of / is indepen- 
dent of N. 

The KPR pr oblem has sim il arity with the Mino rity 
Game Problem (lArthud (|1994| ): IChallet et all (|2004D ) as 
in both the games, herding behaviour is punished and di- 
versity's encouraged. Also, both involves learning of the 
agents from the past successes etc. Of course, KPR has 
some simple exact solution limits, a few of which are dis- 
cussed here. The real challenge is, of course, to design al- 
gorithms of learning mixed strategies (e.g., from the pool 
discussed here) by the agents so that the fair social norm 
emerges eventually (in iV° or In N order time) even when 
every one decides on the basis of their own information 
independently. As we have seen, some naive strategies 
give better values of / compared to most of the "smarter" 
strategies like strict crowd-avoiding strategies, etc. This 
observation in fact compares we ll with earlier observa- 
tion in minority games (see e.g. ISatinover and Sornettd 
(120071) 1. 



It may be noted that all the stochastic strategics, being 
parallel in computational mode, have the advantage that 
they converge to solution at smaller time steps (~ iV° or 
In TV) while for deterministic strategies the convergence 
time is typically of order of A'^, which renders such strate- 
gics useless in the truly macroscopic {N — >■ oo) limits. 
However, deterministic strategies are useful when is 
small and rational agents can design appropriate pu nish- 
ment schemes for the deviators (see iKandoril ( 2008f )). 

The study of the KPR problem shows that while a 
dictated solution leads to one of the best possible solution 
to the problem, with each agent getting his dinner at the 
best ranked restaurant with a period of A^ evenings, and 
with best possible value of / (= 1) starting from the first 
evening itself. The parallel decision strategies (employing 
evolving algorithms by the agents, and past informations, 
e.g., of n(t)), which are necessarily parallel among the 
agents and stochastic (as in democ racy) , are less efficien t 
(/ < 1; the best one discussed in IGhosh et al. 1 (|2010br ). 
giving / ~ 0.8 only). Note here that the time required 
is not dependent on A^. We also note that most of the 
"smarter" strategies lead to much lower efficiency. 



X. CONCLUSIONS AND OUTLOOK 

Agent-based models of order books are a good ex- 
ample of interactions between ideas and methods that 
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are usually linked either to Economics and Finance (mi- 
crostructure of markets, agent interaction) or to Physics 
(reaction-diffusion processes, deposition-evaporation pro- 
cess, kinetic theory of gases). As of today, existing mod- 
els exhibit a trade-off between "realism" and calibration 
in its mechanisms and proc esses (empirical models such 
as iMike and Farmer! ( 20081 )) , and ex planatory power of 
simple obse rved behaviours ( Cont and Bouc haud (2000l) : 
IContI ( 20071 ) for example). In the first case, some of the 
"stylized facts" may be reproduced, but using empiri- 
cal processes that may not be linked to any behaviour 
observed on the market. In the second case, these are 
only toy models that cannot b e calibrated on data. The 
mixing of many features, as in iLux and Marchesil (|2000[ ) 
and as is usually the case in behavioural finance, leads 
to poorly tractable models where the sensitivity to one 
parameter is hardly understandable. Therefore, no em- 
pirical model can tackle properly empirical facts such as 
volatility clustering. Importing toy model features ex- 
plaining volatility clustering or market interactions in or- 
der book models is yet to be done. Finally, let us also 
note that to our knowledge, no agent-based model of or- 
der books deals with the multidimensional case. Imple- 
menting agents trading simultaneously several assets in 
a way that reproduces empirical observations on correla- 
tion and dependence remains an open challenge. 

We believe this type of modelling is crucial for future 
developments in finance. The financial crisis that oc- 
curred in 2007-2008 is expected to create a shock in clas- 
sic modelling in Economics and Finance. Man y scientists 
have e xp ressed their views on t his su bj ect (e.g. Bouchaud 
(I2008D : iLux and Westerhofj (l2009l ): [Farmer and Folev 



20091 )) and we believe as well that agent-based models 



we have presented here will be at the c ore of futu r e mod - 
elling. As illustrations, let us mention llori et al. 1 llooi), 
which mod els the interbank mar ket and investigates sys- 
temic risk, |Th urner et al. 1 (I2009D . which investigates the 
effects of use of lev erage and margin calls on the stabil- 
ity of a market and lYakovenko and Rosserl ( 2009| ) , which 
provides a brief overview of the study of wealth distribu- 
tions and inequalities. No doubt these will be followed 
by many other contributions. 
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